Effects of utterance length and vocal loudness on speech breathing in older adults.
Huber, Jessica E
2008-12-31
Age-related reductions in pulmonary elastic recoil and respiratory muscle strength can affect how older adults generate subglottal pressure required for speech production. The present study examined age-related changes in speech breathing by manipulating utterance length and loudness during a connected speech task (monologue). Twenty-three older adults and twenty-eight young adults produced a monologue at comfortable loudness and pitch and with multi-talker babble noise playing in the room to elicit louder speech. Dependent variables included sound pressure level, speech rate, and lung volume initiation, termination, and excursion. Older adults produced shorter utterances than young adults overall. Age-related effects were larger for longer utterances. Older adults demonstrated very different lung volume adjustments for loud speech than young adults. These results suggest that older adults have a more difficult time when the speech system is being taxed by both utterance length and loudness. The data were consistent with the hypothesis that both young and older adults use utterance length in premotor speech planning processes.
Recurrence Quantifcation Analysis of Sentence-Level Speech Kinematics
ERIC Educational Resources Information Center
Jackson, Eric S.; Tiede, Mark; Riley, Michael A.; Whalen, D. H.
2016-01-01
Purpose: Current approaches to assessing sentence-level speech variability rely on measures that quantify variability across utterances and use normalization procedures that alter raw trajectory data. The current work tests the feasibility of a less restrictive nonlinear approach--recurrence quantification analysis (RQA)--via a procedural example…
Revisiting speech rate and utterance length manipulations in stuttering speakers.
Blomgren, Michael; Goberman, Alexander M
2008-01-01
The goal of this study was to evaluate stuttering frequency across a multidimensional (2x2) hierarchy of speech performance tasks. Specifically, this study examined the interaction between changes in length of utterance and levels of speech rate stability. Forty-four adult male speakers participated in the study (22 stuttering speakers and 22 non-stuttering speakers). Participants were audio and video recorded while producing a spontaneous speech task and four different experimental speaking tasks. The four experimental speaking tasks involved reading a list of 45 words and a list 45 phrases two times each. One reading of each list involved speaking at a steady habitual rate (habitual rate tasks) and another reading involved producing each list at a variable speaking rate (variable rate tasks). For the variable rate tasks, participants were directed to produce words or phrases at randomly ordered slow, habitual, and fast rates. The stuttering speakers exhibited significantly more stuttering on the variable rate tasks than on the habitual rate tasks. In addition, the stuttering speakers exhibited significantly more stuttering on the first word of the phrase length tasks compared to the single word tasks. Overall, the results indicated that varying levels of both utterance length and temporal complexity function to modulate stuttering frequency in adult stuttering speakers. Discussion focuses on issues of speech performance according to stuttering severity and possible clinical implications. The reader will learn about and be able to: (1) describe the mediating effects of length of utterance and speech rate on the frequency of stuttering in stuttering speakers; (2) understand the rationale behind multidimensional skill performance matrices; and (3) describe possible applications of motor skill performance matrices to stuttering therapy.
Recurrence Quantification Analysis of Sentence-Level Speech Kinematics.
Jackson, Eric S; Tiede, Mark; Riley, Michael A; Whalen, D H
2016-12-01
Current approaches to assessing sentence-level speech variability rely on measures that quantify variability across utterances and use normalization procedures that alter raw trajectory data. The current work tests the feasibility of a less restrictive nonlinear approach-recurrence quantification analysis (RQA)-via a procedural example and subsequent analysis of kinematic data. To test the feasibility of RQA, lip aperture (i.e., the Euclidean distance between lip-tracking sensors) was recorded for 21 typically developing adult speakers during production of a simple utterance. The utterance was produced in isolation and in carrier structures differing just in length or in length and complexity. Four RQA indices were calculated: percent recurrence (%REC), percent determinism (%DET), stability (MAXLINE), and stationarity (TREND). Percent determinism (%DET) decreased only for the most linguistically complex sentence; MAXLINE decreased as a function of linguistic complexity but increased for the longer-only sentence; TREND decreased as a function of both length and linguistic complexity. This research note demonstrates the feasibility of using RQA as a tool to compare speech variability across speakers and groups. RQA offers promise as a technique to assess effects of potential stressors (e.g., linguistic or cognitive factors) on the speech production system.
Recurrence Quantification Analysis of Sentence-Level Speech Kinematics
Tiede, Mark; Riley, Michael A.; Whalen, D. H.
2016-01-01
Purpose Current approaches to assessing sentence-level speech variability rely on measures that quantify variability across utterances and use normalization procedures that alter raw trajectory data. The current work tests the feasibility of a less restrictive nonlinear approach—recurrence quantification analysis (RQA)—via a procedural example and subsequent analysis of kinematic data. Method To test the feasibility of RQA, lip aperture (i.e., the Euclidean distance between lip-tracking sensors) was recorded for 21 typically developing adult speakers during production of a simple utterance. The utterance was produced in isolation and in carrier structures differing just in length or in length and complexity. Four RQA indices were calculated: percent recurrence (%REC), percent determinism (%DET), stability (MAXLINE), and stationarity (TREND). Results Percent determinism (%DET) decreased only for the most linguistically complex sentence; MAXLINE decreased as a function of linguistic complexity but increased for the longer-only sentence; TREND decreased as a function of both length and linguistic complexity. Conclusions This research note demonstrates the feasibility of using RQA as a tool to compare speech variability across speakers and groups. RQA offers promise as a technique to assess effects of potential stressors (e.g., linguistic or cognitive factors) on the speech production system. PMID:27824987
Comparison of Acoustic and Kinematic Approaches to Measuring Utterance-Level Speech Variability
ERIC Educational Resources Information Center
Howell, Peter; Anderson, Andrew J.; Bartrip, Jon; Bailey, Eleanor
2009-01-01
Purpose: The spatiotemporal index (STI) is one measure of variability. As currently implemented, kinematic data are used, requiring equipment that cannot be used with some patient groups or in scanners. An experiment is reported that addressed whether STI can be extended to an audio measure of sound pressure of the speech envelope over time that…
Auditory Cortex Processes Variation in Our Own Speech
Sitek, Kevin R.; Mathalon, Daniel H.; Roach, Brian J.; Houde, John F.; Niziolek, Caroline A.; Ford, Judith M.
2013-01-01
As we talk, we unconsciously adjust our speech to ensure it sounds the way we intend it to sound. However, because speech production involves complex motor planning and execution, no two utterances of the same sound will be exactly the same. Here, we show that auditory cortex is sensitive to natural variations in self-produced speech from utterance to utterance. We recorded event-related potentials (ERPs) from ninety-nine subjects while they uttered “ah” and while they listened to those speech sounds played back. Subjects' utterances were sorted based on their formant deviations from the previous utterance. Typically, the N1 ERP component is suppressed during talking compared to listening. By comparing ERPs to the least and most variable utterances, we found that N1 was less suppressed to utterances that differed greatly from their preceding neighbors. In contrast, an utterance's difference from the median formant values did not affect N1. Trial-to-trial pitch (f0) deviation and pitch difference from the median similarly did not affect N1. We discuss mechanisms that may underlie the change in N1 suppression resulting from trial-to-trial formant change. Deviant utterances require additional auditory cortical processing, suggesting that speaking-induced suppression mechanisms are optimally tuned for a specific production. PMID:24349399
Development of speech motor control: lip movement variability.
Schötz, Susanne; Frid, Johan; Löfqvist, Anders
2013-06-01
This study examined variability of lip movements across repetitions of the same utterance as a function of age in Swedish speakers. The specific purpose was to extend earlier findings by examining variability in both phase and amplitude. Subjects were 50 typically developed native Swedish children and adults (28 females, 22 males, aged 5 to 31 yr). Lip movements were recorded during 15 to 20 repetitions of a short Swedish phrase using three-dimensional articulography. After correction for head movements, the kinematic records were expressed in a maxilla-based coordinate system. Movement onset and offset of the utterance were identified using kinematic landmarks. The Euclidean distance between receivers on the upper and lower lips was calculated and subjected to functional data analysis to assess both phase and amplitude variability. Results show a decrease in both indices as a function of age, with a greater reduction of amplitude variability. There was no difference between males and females for either index. The two indices were moderately correlated with each other, suggesting that they capture different aspects of speech production. Utterance duration also decreased with age, but variability was unrelated to duration. The standard deviation of utterance duration also decreased with age. The present results thus suggest that age related changes in speech motor control continue up until 30 years of age.
Can a model of overlapping gestures account for scanning speech patterns?
Tjaden, K
1999-06-01
A simple acoustic model of overlapping, sliding gestures was used to evaluate whether coproduction was reduced for neurologic speakers with scanning speech patterns. F2 onset frequency was used as an acoustic measure of coproduction or gesture overlap. The effects of speaking rate (habitual versus fast) and utterance position (initial versus medial) on F2 frequency, and presumably gesture overlap, were examined. Regression analyses also were used to evaluate the extent to which across-repetition temporal variability in F2 trajectories could be explained as variation in coproduction for consonants and vowels. The lower F2 onset frequencies for disordered speakers suggested that gesture overlap was reduced for neurologic individuals with scanning speech. Speaking rate change did not influence F2 onset frequencies, and presumably gesture overlap, for healthy or disordered speakers. F2 onset frequency differences for utterance-initial and -medial repetitions were interpreted to suggest reduced coproduction for the utterance-initial position. The utterance-position effects on F2 onset frequency, however, likely were complicated by position-related differences in articulatory scaling. The results of the regression analysis indicated that gesture sliding accounts, in part, for temporal variability in F2 trajectories. Taken together, the results of this study provide support for the idea that speech production theory for healthy talkers helps to account for disordered speech production.
Lexical and phonological variability in preschool children with speech sound disorder.
Macrae, Toby; Tyler, Ann A; Lewis, Kerry E
2014-02-01
The authors of this study examined relationships between measures of word and speech error variability and between these and other speech and language measures in preschool children with speech sound disorder (SSD). In this correlational study, 18 preschool children with SSD, age-appropriate receptive vocabulary, and normal oral motor functioning and hearing were assessed across 2 sessions. Experimental measures included word and speech error variability, receptive vocabulary, nonword repetition (NWR), and expressive language. Pearson product–moment correlation coefficients were calculated among the experimental measures. The correlation between word and speech error variability was slight and nonsignificant. The correlation between word variability and receptive vocabulary was moderate and negative, although nonsignificant. High word variability was associated with small receptive vocabularies. The correlations between speech error variability and NWR and between speech error variability and the mean length of children's utterances were moderate and negative, although both were nonsignificant. High speech error variability was associated with poor NWR and language scores. High word variability may reflect unstable lexical representations, whereas high speech error variability may reflect indistinct phonological representations. Preschool children with SSD who show abnormally high levels of different types of speech variability may require slightly different approaches to intervention.
Ertmer, David J.; Jung, Jongmin; Kloiber, Diana True
2013-01-01
Background Speech-like utterances containing rapidly combined consonants and vowels eventually dominate the prelinguistic and early word productions of toddlers who are developing typically (TD). It seems reasonable to expect a similar phenomenon in young cochlear implants (CI) recipients. This study sought to determine the number of months of robust hearing experience needed to achieve a majority of speech-like utterances in both of these groups. Methods Speech samples were recorded at 3-month intervals during the first 2 years of CI experience, and between 6- and 24 months of age in TD children. Speech-like utterances were operationally defined as those belonging to the Basic Canonical Syllables (BCS) or Advanced Forms (AF) levels of the Consolidated Stark Assessment of Early Vocal Development-Revised. Results On average, the CI group achieved a majority of speech- like utterances after 12 months, and the TD group after 18 months of robust hearing experience. The CI group produced greater percentages of speech-like utterances at each interval until 24-months, when both groups approximated 80%. Conclusion Auditory deprivation did not limit progress in vocal development as young CI recipients showed more-rapid-than-typical speech development during the first 2 years of device use. Implications for the Infraphonological model of speech development are considered. PMID:23813203
Revisiting Speech Rate and Utterance Length Manipulations in Stuttering Speakers
ERIC Educational Resources Information Center
Blomgren, Michael; Goberman, Alexander M.
2008-01-01
The goal of this study was to evaluate stuttering frequency across a multidimensional (2 x 2) hierarchy of speech performance tasks. Specifically, this study examined the interaction between changes in length of utterance and levels of speech rate stability. Forty-four adult male speakers participated in the study (22 stuttering speakers and 22…
Nyman, Anna; Lohmander, Anette
2018-01-01
Babbling is an important precursor to speech, but has not yet been thoroughly investigated in children with neurodevelopmental disabilities. Canonical babbling ratio (CBR) is a commonly used but time-consuming measure for quantifying babbling. The aim of this study was twofold: to validate a simplified version of the CBR (CBR UTTER ), and to use this measure to determine if early precursors to speech and language development could be detected in children with different neurodevelopmental disabilities. Two different data sets were used. In Part I, CBR UTTER was compared to two other CBR measures using previously obtained phonetic transcriptions of 3571 utterances from 38 audio recordings of 12-18 month old children with and without cleft palate. In CBR UTTER , number of canonical utterances was divided by total number of utterances. In CBR syl , number of canonical syllables was divided by total number of syllables. In CBR utt , number of canonical syllables was divided by total number of utterances. High agreement was seen between CBR UTTER and CBR syl , suggesting CBR UTTER as an alternative. In Part II, babbling in children with neurodevelopmental disability was examined. Eighteen children aged 12-22 months with Down syndrome, cerebral palsy or developmental delay were audio-video recorded during interaction with a parent. Recordings were analysed by observation of babbling, consonant production, calculation of CBR UTTER , and compared to data from controls. The study group showed significantly lower occurrence of all variables, except for of plosives. The long-term relevance of the findings for the speech and language development of the children needs to be investigated.
Potential interactions among linguistic, autonomic, and motor factors in speech.
Kleinow, Jennifer; Smith, Anne
2006-05-01
Though anecdotal reports link certain speech disorders to increases in autonomic arousal, few studies have described the relationship between arousal and speech processes. Additionally, it is unclear how increases in arousal may interact with other cognitive-linguistic processes to affect speech motor control. In this experiment we examine potential interactions between autonomic arousal, linguistic processing, and speech motor coordination in adults and children. Autonomic responses (heart rate, finger pulse volume, tonic skin conductance, and phasic skin conductance) were recorded simultaneously with upper and lower lip movements during speech. The lip aperture variability (LA variability index) across multiple repetitions of sentences that varied in length and syntactic complexity was calculated under low- and high-arousal conditions. High arousal conditions were elicited by performance of the Stroop color word task. Children had significantly higher lip aperture variability index values across all speaking tasks, indicating more variable speech motor coordination. Increases in syntactic complexity and utterance length were associated with increases in speech motor coordination variability in both speaker groups. There was a significant effect of Stroop task, which produced increases in autonomic arousal and increased speech motor variability in both adults and children. These results provide novel evidence that high arousal levels can influence speech motor control in both adults and children. (c) 2006 Wiley Periodicals, Inc.
Finding the music of speech: Musical knowledge influences pitch processing in speech.
Vanden Bosch der Nederlanden, Christina M; Hannon, Erin E; Snyder, Joel S
2015-10-01
Few studies comparing music and language processing have adequately controlled for low-level acoustical differences, making it unclear whether differences in music and language processing arise from domain-specific knowledge, acoustic characteristics, or both. We controlled acoustic characteristics by using the speech-to-song illusion, which often results in a perceptual transformation to song after several repetitions of an utterance. Participants performed a same-different pitch discrimination task for the initial repetition (heard as speech) and the final repetition (heard as song). Better detection was observed for pitch changes that violated rather than conformed to Western musical scale structure, but only when utterances transformed to song, indicating that music-specific pitch representations were activated and influenced perception. This shows that music-specific processes can be activated when an utterance is heard as song, suggesting that the high-level status of a stimulus as either language or music can be behaviorally dissociated from low-level acoustic factors. Copyright © 2015 Elsevier B.V. All rights reserved.
Peng, Shu-Chen; Tomblin, J Bruce; Turner, Christopher W
2008-06-01
Current cochlear implant (CI) devices are limited in providing voice pitch information that is critical for listeners' recognition of prosodic contrasts of speech (e.g., intonation and lexical tones). As a result, mastery of the production and perception of such speech contrasts can be very challenging for prelingually deafened individuals who received a CI in their childhood (i.e., pediatric CI recipients). The purpose of this study was to investigate (a) pediatric CI recipients' mastery of the production and perception of speech intonation contrasts, in comparison with their age-matched peers with normal hearing (NH), and (b) the relationships between intonation production and perception in CI and NH individuals. Twenty-six pediatric CI recipients aged from 7.44 to 20.74 yrs and 17 age-matched individuals with NH participated. All CI users were prelingually deafened, and each of them received a CI between 1.48 and 6.34 yrs of age. Each participant performed an intonation production task and an intonation perception task. In the production task, 10 questions and 10 statements that were syntactically matched (e.g., "The girl is on the playground." versus "The girl is on the playground?") were elicited from each participant using interactive discourse involving pictures. These utterances were judged by a panel of eight adult listeners with NH in terms of utterance type accuracy (question versus statement) and contour appropriateness (on a five-point scale). In the perception task, each participant identified the speech intonation contrasts of natural utterances in a two-alternative forced-choice task. The results from the production task indicated that CI participants' scores for both utterance type accuracy and contour appropriateness were significantly lower than the scores of NH participants (both p < 0.001). The results from the perception task indicated that CI participants' identification accuracy was significantly lower than that of their NH peers (CI, 70.13% versus NH, 97.11%, p < 0.001). The Pearson correlation coefficients (r) between CI participants' performance levels in the production and perception tasks were approximately 0.65 (p = 0.001). As a group, pediatric CI recipients do not show mastery of speech intonation in their production or perception to the same extent as their NH peers. Pediatric CI recipients' performance levels in the production and perception of speech intonation contrasts are moderately correlated. Intersubject variability exists in pediatric CI recipients' mastery levels in the production and perception of speech intonation contrasts. These findings suggest the importance of addressing both aspects (production and perception) of speech intonation in the aural rehabilitation and speech intervention programs for prelingually deafened children and young adults who use a CI.
Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech.
Gerratt, Bruce R; Kreiman, Jody; Garellek, Marc
2016-10-01
The question of what type of utterance-a sustained vowel or continuous speech-is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation. Speakers with voice disorders sustained vowels and read sentences. Vowel samples were excerpted from the steadiest portion of each vowel in the sentences. In addition to sustained and excerpted vowels, a 3rd set of stimuli was created by shortening sustained vowel productions to match the duration of vowels excerpted from continuous speech. Acoustic measures were made on the stimuli, and listeners judged the severity of vocal quality deviation. Sustained vowels and those extracted from continuous speech contain essentially the same acoustic and perceptual information about vocal quality deviation. Perceived and/or measured differences between continuous speech and sustained vowels derive largely from voice source variability across segmental and prosodic contexts and not from variations in vocal fold vibration in the quasisteady portion of the vowels. Approaches to voice quality assessment by using continuous speech samples average across utterances and may not adequately quantify the variability they are intended to assess.
Graber, Emily; Simchy-Gross, Rhimmon; Margulis, Elizabeth Hellmuth
2017-12-01
The speech-to-song (STS) illusion is a phenomenon in which some spoken utterances perceptually transform to song after repetition [Deutsch, Henthorn, and Lapidis (2011). J. Acoust. Soc. Am. 129, 2245-2252]. Tierney, Dick, Deutsch, and Sereno [(2013). Cereb. Cortex. 23, 249-254] developed a set of stimuli where half tend to transform to perceived song with repetition and half do not. Those that transform and those that do not can be understood to induce a musical or linguistic mode of listening, respectively. By comparing performance on perceptual tasks related to transforming and non-transforming utterances, the current study examines whether the musical mode of listening entails higher sensitivity to temporal regularity and better absolute pitch (AP) memory compared to the linguistic mode. In experiment 1, inter-stimulus intervals within STS trials were steady, slightly variable, or highly variable. Participants reported how temporally regular utterance entrances were. In experiment 2, participants performed an AP memory task after a blocked STS exposure phase. Utterances identically matching those used in the exposure phase were targets among transposed distractors in the test phase. Results indicate that listeners exhibit heightened awareness of temporal manipulations but reduced awareness of AP manipulations to transforming utterances. This methodology establishes a framework for implicitly differentiating musical from linguistic perception.
Human phoneme recognition depending on speech-intrinsic variability.
Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger
2010-11-01
The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).
Cognitive Load Reduces Perceived Linguistic Convergence Between Dyads.
Abel, Jennifer; Babel, Molly
2017-09-01
Speech convergence is the tendency of talkers to become more similar to someone they are listening or talking to, whether that person is a conversational partner or merely a voice heard repeating words. To elucidate the nature of the mechanisms underlying convergence, this study uses different levels of task difficulty on speech convergence within dyads collaborating on a task. Dyad members had to build identical LEGO® constructions without being able to see each other's construction, and with each member having half of the instructions required to complete the construction. Three levels of task difficulty were created, with five dyads at each level (30 participants total). Task difficulty was also measured using completion time and error rate. Listeners who heard pairs of utterances from each dyad judged convergence to be occurring in the Easy condition and to a lesser extent in the Medium condition, but not in the Hard condition. Amplitude envelope acoustic similarity analyses of the same utterance pairs showed that convergence occurred in dyads with shorter completion times and lower error rates. Together, these results suggest that while speech convergence is a highly variable behavior, it may occur more in contexts of low cognitive load. The relevance of these results for the current automatic and socially-driven models of convergence is discussed.
Voice recognition through phonetic features with Punjabi utterances
NASA Astrophysics Data System (ADS)
Kaur, Jasdeep; Juglan, K. C.; Sharma, Vishal; Upadhyay, R. K.
2017-07-01
This paper deals with perception and disorders of speech in view of Punjabi language. Visualizing the importance of voice identification, various parameters of speaker identification has been studied. The speech material was recorded with a tape recorder in their normal and disguised mode of utterances. Out of the recorded speech materials, the utterances free from noise, etc were selected for their auditory and acoustic spectrographic analysis. The comparison of normal and disguised speech of seven subjects is reported. The fundamental frequency (F0) at similar places, Plosive duration at certain phoneme, Amplitude ratio (A1:A2) etc. were compared in normal and disguised speech. It was found that the formant frequency of normal and disguised speech remains almost similar only if it is compared at the position of same vowel quality and quantity. If the vowel is more closed or more open in the disguised utterance the formant frequency will be changed in comparison to normal utterance. The ratio of the amplitude (A1: A2) is found to be speaker dependent. It remains unchanged in the disguised utterance. However, this value may shift in disguised utterance if cross sectioning is not done at the same location.
Listener Reliability in Assigning Utterance Boundaries in Children's Spontaneous Speech
ERIC Educational Resources Information Center
Stockman, Ida J.
2010-01-01
Research and clinical practices often rely on an utterance unit for spoken language analysis. This paper calls attention to the problems encountered when identifying utterance boundaries in young children's spontaneous conversational speech. The results of a reliability study of utterance boundary assignment are described for 20 females with…
NASA Astrophysics Data System (ADS)
Mosko, J. D.; Stevens, K. N.; Griffin, G. R.
1983-08-01
Acoustical analyses were conducted of words produced by four speakers in a motion stress-inducing situation. The aim of the analyses was to document the kinds of changes that occur in the vocal utterances of speakers who are exposed to motion stress and to comment on the implications of these results for the design and development of voice interactive systems. The speakers differed markedly in the types and magnitudes of the changes that occurred in their speech. For some speakers, the stress-inducing experimental condition caused an increase in fundamental frequency, changes in the pattern of vocal fold vibration, shifts in vowel production and changes in the relative amplitudes of sounds containing turbulence noise. All speakers showed greater variability in the experimental condition than in more relaxed control situation. The variability was manifested in the acoustical characteristics of individual phonetic elements, particularly in speech sound variability observed serve to unstressed syllables. The kinds of changes and variability observed serve to emphasize the limitations of speech recognition systems based on template matching of patterns that are stored in the system during a training phase. There is need for a better understanding of these phonetic modifications and for developing ways of incorporating knowledge about these changes within a speech recognition system.
Huber, Jessica E.; Darling, Meghan
2012-01-01
Purpose The purpose of the present study was to examine the effects of cognitive-linguistic deficits and respiratory physiologic changes on respiratory support for speech in PD, using two speech tasks, reading and extemporaneous speech. Methods Five women with PD, 9 men with PD, and 14 age- and sex-matched control participants read a passage and spoke extemporaneously on a topic of their choice at comfortable loudness. Sound pressure level, syllables per breath group, speech rate, and lung volume parameters were measured. Number of formulation errors, disfluencies, and filled pauses were counted. Results Individuals with PD produced shorter utterances as compared to control participants. The relationships between utterance length and lung volume initiation and inspiratory duration were weaker in individuals with PD than for control participants, particularly for the extemporaneous speech task. These results suggest less consistent planning for utterance length by individuals with PD in extemporaneous speech. Individuals with PD produced more formulation errors in both tasks and significantly fewer filled pauses in extemporaneous speech. Conclusions Both respiratory physiologic and cognitive-linguistic issues affected speech production by individuals with PD. Overall, individuals with PD had difficulty planning or coordinating language formulation and respiratory support, in particular during extemporaneous speech. PMID:20844256
Chon, HeeCheong; Sawyer, Jean; Ambrose, Nicoline G.
2014-01-01
Purpose The purpose of this study was to investigate characteristics of four types of utterances in preschool children who stutter: perceptually fluent, containing normal disfluencies (OD utterance), containing stuttering-like disfluencies (SLD utterance), and containing both normal and stuttering-like disfluencies (SLD+OD utterance). Articulation rate and length of utterance were measured to seek the differences. Because articulation rate may reflect temporal aspects of speech motor control, it was predicted that the articulation rate would be different between perceptually fluent utterances and utterances containing disfluencies. The length of utterance was also expected to show different patterns. Method Participants were 14 preschool children who stutter. Disfluencies were identified from their spontaneous speech samples, and articulation rate in syllables per second and utterance length in syllables were measured for the four types of utterances. Results and discussion There was no significant difference in articulation rate between each type of utterance. Significantly longer utterances were found only in SLD+OD utterances compared to fluent utterances, suggesting that utterance length may be related to efforts in executing motor as well as linguistic planning. The SLD utterance revealed a significant negative correlation in that longer utterances tended to be slower in articulation rates. Longer utterances may place more demand on speech motor control due to more linguistic and/or grammatical features, resulting in stuttering-like disfluencies and a decreased rate. PMID:22995336
2015-03-31
analysis. For scene analysis, we use Temporal Data Crystallization (TDC), and for logical analysis, we use Speech Act theory and Toulmin Argumentation...utterance in the discussion record. (i) An utterance ID, and a speaker ID (ii) Speech acts (iii) Argument structure Speech act denotes...mediator is expected to use more OQs than CQs. When the speech act of an utterance is an argument, furthermore, we recognize the conclusion part
The sounds of sarcasm in English and Cantonese: A cross-linguistic production and perception study
NASA Astrophysics Data System (ADS)
Cheang, Henry S.
Three studies were conducted to examine the acoustic markers of sarcasm in English and in Cantonese, and the manner in which such markers are perceived across these languages. The first study consisted of acoustic analyses of sarcastic utterances spoken in English to verify whether particular prosodic cues correspond to English sarcastic speech. Native English speakers produced utterances expressing sarcasm, sincerity, humour, or neutrality. Measures taken from each utterance included fundamental frequency (F0), amplitude, speech rate, harmonics-to-noise ratio (HNR, to probe voice quality), and one-third octave spectral values (to probe resonance). The second study was conducted to ascertain whether specific acoustic features marked sarcasm in Cantonese and how such features compare with English sarcastic prosody. The elicitation and acoustic analysis methods from the first study were applied to similarly-constructed Cantonese utterances spoken by native Cantonese speakers. Direct acoustic comparisons between Cantonese and English sarcasm exemplars were also made. To further test for cross-linguistic prosodic cues of sarcasm and to assess whether sarcasm could be conveyed across languages, a cross-linguistic perceptual study was then performed. A subset of utterances from the first two studies was presented to naive listeners fluent in either Cantonese or English. Listeners had to identify the attitude in each utterance regardless of language of presentation. Sarcastic utterances in English (regardless of text) were marked by lower mean F0 and reductions in HNR and F0 standard deviation (relative to comparison attitudes). Resonance changes, reductions in both speech rate and F0 range signalled sarcasm in conjunction with some vocabulary terms. By contrast, higher mean F0, amplitude range reductions, and F0 range restrictions corresponded with sarcastic utterances spoken in Cantonese regardless of text. For Cantonese, reduced speech rate and higher HNR interacted with certain vocabulary to mark sarcasm. Sarcastic prosody was most distinguished from acoustic features corresponding to sincere utterances in both languages. Direct English-Cantonese comparisons between sarcasm tokens confirmed cross-linguistic differences in sarcastic prosody. Finally, Cantonese and English listeners could identify sarcasm in their native languages but identified sarcastic utterances spoken in the unfamiliar language at chance levels. It was concluded that particular acoustic cues marked sarcastic speech in Cantonese and English, and these patterns of sarcastic prosody were specific to each language.
Analysis of False Starts in Spontaneous Speech.
ERIC Educational Resources Information Center
O'Shaughnessy, Douglas
A primary difference between spontaneous speech and read speech concerns the use of false starts, where a speaker interrupts the flow of speech to restart his or her utterance. A study examined the acoustic aspects of such restarts in a widely-used speech database, examining approximately 1000 utterances, about 10% of which contained a restart.…
Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition
Rigoulot, Simon; Wassiliwizky, Eugen; Pell, Marc D.
2013-01-01
Recent studies suggest that the time course for recognizing vocal expressions of basic emotion in speech varies significantly by emotion type, implying that listeners uncover acoustic evidence about emotions at different rates in speech (e.g., fear is recognized most quickly whereas happiness and disgust are recognized relatively slowly; Pell and Kotz, 2011). To investigate whether vocal emotion recognition is largely dictated by the amount of time listeners are exposed to speech or the position of critical emotional cues in the utterance, 40 English participants judged the meaning of emotionally-inflected pseudo-utterances presented in a gating paradigm, where utterances were gated as a function of their syllable structure in segments of increasing duration from the end of the utterance (i.e., gated syllable-by-syllable from the offset rather than the onset of the stimulus). Accuracy for detecting six target emotions in each gate condition and the mean identification point for each emotion in milliseconds were analyzed and compared to results from Pell and Kotz (2011). We again found significant emotion-specific differences in the time needed to accurately recognize emotions from speech prosody, and new evidence that utterance-final syllables tended to facilitate listeners' accuracy in many conditions when compared to utterance-initial syllables. The time needed to recognize fear, anger, sadness, and neutral from speech cues was not influenced by how utterances were gated, although happiness and disgust were recognized significantly faster when listeners heard the end of utterances first. Our data provide new clues about the relative time course for recognizing vocally-expressed emotions within the 400–1200 ms time window, while highlighting that emotion recognition from prosody can be shaped by the temporal properties of speech. PMID:23805115
Temporal variability in sung productions of adolescents who stutter.
Falk, Simone; Maslow, Elena; Thum, Georg; Hoole, Philip
2016-01-01
Singing has long been used as a technique to enhance and reeducate temporal aspects of articulation in speech disorders. In the present study, differences in temporal structure of sung versus spoken speech were investigated in stuttering. In particular, the question was examined if singing helps to reduce VOT variability of voiceless plosives, which would indicate enhanced temporal coordination of oral and laryngeal processes. Eight German adolescents who stutter and eight typically fluent peers repeatedly spoke and sang a simple German congratulation formula in which a disyllabic target word (e.g., /'ki:ta/) was repeated five times. Every trial, the first syllable of the word was varied starting equally often with one of the three voiceless German stops /p/, /t/, /k/. Acoustic analyses showed that mean VOT and stop gap duration reduced during singing compared to speaking while mean vowel and utterance duration was prolonged in singing in both groups. Importantly, adolescents who stutter significantly reduced VOT variability (measured as the Coefficient of Variation) during sung productions compared to speaking in word-initial stressed positions while the control group showed a slight increase in VOT variability. However, in unstressed syllables, VOT variability increased in both adolescents who do and do not stutter from speech to song. In addition, vowel and utterance durational variability decreased in both groups, yet, adolescents who stutter were still more variable in utterance duration independent of the form of vocalization. These findings shed new light on how singing alters temporal structure and in particular, the coordination of laryngeal-oral timing in stuttering. Future perspectives for investigating how rhythmic aspects could aid the management of fluent speech in stuttering are discussed. Readers will be able to describe (1) current perspectives on singing and its effects on articulation and fluency in stuttering and (2) acoustic parameters such as VOT variability which indicate the efficiency of control and coordination of laryngeal-oral movements. They will understand and be able to discuss (3) how singing reduces temporal variability in the productions of adolescents who do and do not stutter and 4) how this is linked to altered articulatory patterns in singing as well as to its rhythmic structure. Copyright © 2016 Elsevier Inc. All rights reserved.
Echolalic and Spontaneous Phrase Speech in Autistic Children.
ERIC Educational Resources Information Center
Howlin, Patricia
1982-01-01
Investigates the syntactical level of spontaneous and echolalic utterances of 26 autistic boys at different stages of phrase speech development. Speech samples were collected over a 90-minute period in unstructured settings in participants' homes. Imitations were not deliberately elicited, and only unprompted, noncommunicative echoes were…
Toni, Ivan; Hagoort, Peter; Kelly, Spencer D.; Özyürek, Aslı
2015-01-01
Recipients process information from speech and co-speech gestures, but it is currently unknown how this processing is influenced by the presence of other important social cues, especially gaze direction, a marker of communicative intent. Such cues may modulate neural activity in regions associated either with the processing of ostensive cues, such as eye gaze, or with the processing of semantic information, provided by speech and gesture. Participants were scanned (fMRI) while taking part in triadic communication involving two recipients and a speaker. The speaker uttered sentences that were and were not accompanied by complementary iconic gestures. Crucially, the speaker alternated her gaze direction, thus creating two recipient roles: addressed (direct gaze) vs unaddressed (averted gaze) recipient. The comprehension of Speech&Gesture relative to SpeechOnly utterances recruited middle occipital, middle temporal and inferior frontal gyri, bilaterally. The calcarine sulcus and posterior cingulate cortex were sensitive to differences between direct and averted gaze. Most importantly, Speech&Gesture utterances, but not SpeechOnly utterances, produced additional activity in the right middle temporal gyrus when participants were addressed. Marking communicative intent with gaze direction modulates the processing of speech–gesture utterances in cerebral areas typically associated with the semantic processing of multi-modal communicative acts. PMID:24652857
The normalities and abnormalities associated with speech in psychometrically-defined schizotypy.
Cohen, Alex S; Auster, Tracey L; McGovern, Jessica E; MacAulay, Rebecca K
2014-12-01
Speech deficits are thought to be an important feature of schizotypy--defined as the personality organization reflecting a putative liability for schizophrenia. There is reason to suspect that these deficits manifest as a function of limited cognitive resources. To evaluate this idea, we examined speech from individuals with psychometrically-defined schizotypy during a low cognitively-demanding task versus a relatively high cognitively-demanding task. A range of objective, computer-based measures of speech tapping speech production (silence, number and length of pauses, number and length of utterances), speech variability (global and local intonation and emphasis) and speech content (word fillers, idea density) were employed. Data for control (n=37) and schizotypy (n=39) groups were examined. Results did not confirm our hypotheses. While the cognitive-load task reduced speech expressivity for subjects as a group for most variables, the schizotypy group was not more pathological in speech characteristics compared to the control group. Interestingly, some aspects of speech in schizotypal versus control subjects were healthier under high cognitive load. Moreover, schizotypal subjects performed better, at a trend level, than controls on the cognitively demanding task. These findings hold important implications for our understanding of the neurocognitive architecture associated with the schizophrenia-spectrum. Of particular note concerns the apparent mismatch between self-reported schizotypal traits and objective performance, and the resiliency of speech under cognitive stress in persons with high levels of schizotypy. Copyright © 2014 Elsevier B.V. All rights reserved.
The Utterance as Speech Genre in Mikhail Bakhtin's Philosophy of Language.
ERIC Educational Resources Information Center
McCord, Michael A.
This paper focuses on one of the central concepts of Mikhail Bakhtin's philosophy of language: his theory of the utterance as speech genre. Before exploring speech genres, the paper discusses Bakhtin's ideas concerning language--both language as a general system, and the use of language as particular speech communication. The paper considers…
Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M
2009-01-01
Comparative study of acoustic correlates of emotional intonation was conducted on two types of speech material: sensible speech utterances and short meaningless words. The corpus of speech signals of different emotional intonations (happy, angry, frightened, sad and neutral) was created using the actor's method of simulation of emotions. Native Russian 20-70-year-old speakers (both professional actors and non-actors) participated in the study. In the corpus, the following characteristics were analyzed: mean values and standard deviations of the power, fundamental frequency, frequencies of the first and second formants, and utterance duration. Comparison of each emotional intonation with "neutral" utterances showed the greatest deviations of the fundamental frequency and frequencies of the first formant. The direction of these deviations was independent of the semantic content of speech utterance and its duration, age, gender, and being actor or non-actor, though the personal features of the speakers affected the absolute values of these frequencies.
NASA Astrophysics Data System (ADS)
Tanioka, Toshimasa; Egashira, Hiroyuki; Takata, Mayumi; Okazaki, Yasuhisa; Watanabe, Kenzi; Kondo, Hiroki
We have designed and implemented a PC operation support system for a physically disabled person with a speech impediment via voice. Voice operation is an effective method for a physically disabled person with involuntary movement of the limbs and the head. We have applied a commercial speech recognition engine to develop our system for practical purposes. Adoption of a commercial engine reduces development cost and will contribute to make our system useful to another speech impediment people. We have customized commercial speech recognition engine so that it can recognize the utterance of a person with a speech impediment. We have restricted the words that the recognition engine recognizes and separated a target words from similar words in pronunciation to avoid misrecognition. Huge number of words registered in commercial speech recognition engines cause frequent misrecognition for speech impediments' utterance, because their utterance is not clear and unstable. We have solved this problem by narrowing the choice of input down in a small number and also by registering their ambiguous pronunciations in addition to the original ones. To realize all character inputs and all PC operation with a small number of words, we have designed multiple input modes with categorized dictionaries and have introduced two-step input in each mode except numeral input to enable correct operation with small number of words. The system we have developed is in practical level. The first author of this paper is physically disabled with a speech impediment. He has been able not only character input into PC but also to operate Windows system smoothly by using this system. He uses this system in his daily life. This paper is written by him with this system. At present, the speech recognition is customized to him. It is, however, possible to customize for other users by changing words and registering new pronunciation according to each user's utterance.
Pronunciation difficulty, temporal regularity, and the speech-to-song illusion.
Margulis, Elizabeth H; Simchy-Gross, Rhimmon; Black, Justin L
2015-01-01
The speech-to-song illusion (Deutsch et al., 2011) tracks the perceptual transformation from speech to song across repetitions of a brief spoken utterance. Because it involves no change in the stimulus itself, but a dramatic change in its perceived affiliation to speech or to music, it presents a unique opportunity to comparatively investigate the processing of language and music. In this study, native English-speaking participants were presented with brief spoken utterances that were subsequently repeated ten times. The utterances were drawn either from languages that are relatively difficult for a native English speaker to pronounce, or languages that are relatively easy for a native English speaker to pronounce. Moreover, the repetition could occur at regular or irregular temporal intervals. Participants rated the utterances before and after the repetitions on a 5-point Likert-like scale ranging from "sounds exactly like speech" to "sounds exactly like singing." The difference in ratings before and after was taken as a measure of the strength of the speech-to-song illusion in each case. The speech-to-song illusion occurred regardless of whether the repetitions were spaced at regular temporal intervals or not; however, it occurred more readily if the utterance was spoken in a language difficult for a native English speaker to pronounce. Speech circuitry seemed more liable to capture native and easy-to-pronounce languages, and more reluctant to relinquish them to perceived song across repetitions.
Utterances in infant-directed speech are shorter, not slower.
Martin, Andrew; Igarashi, Yosuke; Jincho, Nobuyuki; Mazuka, Reiko
2016-11-01
It has become a truism in the literature on infant-directed speech (IDS) that IDS is pronounced more slowly than adult-directed speech (ADS). Using recordings of 22 Japanese mothers speaking to their infant and to an adult, we show that although IDS has an overall lower mean speech rate than ADS, this is not the result of an across-the-board slowing in which every vowel is expanded equally. Instead, the speech rate difference is entirely due to the effects of phrase-final lengthening, which disproportionally affects IDS because of its shorter utterances. These results demonstrate that taking utterance-internal prosodic characteristics into account is crucial to studies of speech rate. Copyright © 2016 Elsevier B.V. All rights reserved.
Hustad, Katherine C; Allison, Kristen M; Sakash, Ashley; McFadd, Emily; Broman, Aimee Teo; Rathouz, Paul J
2017-08-01
To determine whether communication at 2 years predicted communication at 4 years in children with cerebral palsy (CP); and whether the age a child first produces words imitatively predicts change in speech production. 30 children (15 males) with CP participated and were seen 5 times at 6-month intervals between 24 and 53 months (mean age at time 1 = 26.9 months (SD 1.9)). Variables were communication classification at 24 and 53 months, age that children were first able to produce words imitatively, single-word intelligibility, and longest utterance produced. Communication at 24 months was highly predictive of abilities at 53 months. Speaking earlier led to faster gains in intelligibility and length of utterance and better outcomes at 53 months than speaking later. Inability to speak at 24 months indicates greater speech and language difficulty at 53 months and a strong need for early communication intervention.
Relative Salience of Speech Rhythm and Speech Rate on Perceived Foreign Accent in a Second Language.
Polyanskaya, Leona; Ordin, Mikhail; Busa, Maria Grazia
2017-09-01
We investigated the independent contribution of speech rate and speech rhythm to perceived foreign accent. To address this issue we used a resynthesis technique that allows neutralizing segmental and tonal idiosyncrasies between identical sentences produced by French learners of English at different proficiency levels and maintaining the idiosyncrasies pertaining to prosodic timing patterns. We created stimuli that (1) preserved the idiosyncrasies in speech rhythm while controlling for the differences in speech rate between the utterances; (2) preserved the idiosyncrasies in speech rate while controlling for the differences in speech rhythm between the utterances; and (3) preserved the idiosyncrasies both in speech rate and speech rhythm. All the stimuli were created in intoned (with imposed intonational contour) and flat (with monotonized, constant F0) conditions. The original and the resynthesized sentences were rated by native speakers of English for degree of foreign accent. We found that both speech rate and speech rhythm influence the degree of perceived foreign accent, but the effect of speech rhythm is larger than that of speech rate. We also found that intonation enhances the perception of fine differences in rhythmic patterns but reduces the perceptual salience of fine differences in speech rate.
Second Language Learners and Speech Act Comprehension
ERIC Educational Resources Information Center
Holtgraves, Thomas
2007-01-01
Recognizing the specific speech act ( Searle, 1969) that a speaker performs with an utterance is a fundamental feature of pragmatic competence. Past research has demonstrated that native speakers of English automatically recognize speech acts when they comprehend utterances (Holtgraves & Ashley, 2001). The present research examined whether this…
Nonhomogeneous transfer reveals specificity in speech motor learning.
Rochet-Capellan, Amélie; Richer, Lara; Ostry, David J
2012-03-01
Does motor learning generalize to new situations that are not experienced during training, or is motor learning essentially specific to the training situation? In the present experiments, we use speech production as a model to investigate generalization in motor learning. We tested for generalization from training to transfer utterances by varying the acoustical similarity between these two sets of utterances. During the training phase of the experiment, subjects received auditory feedback that was altered in real time as they repeated a single consonant-vowel-consonant utterance. Different groups of subjects were trained with different consonant-vowel-consonant utterances, which differed from a subsequent transfer utterance in terms of the initial consonant or vowel. During the adaptation phase of the experiment, we observed that subjects in all groups progressively changed their speech output to compensate for the perturbation (altered auditory feedback). After learning, we tested for generalization by having all subjects produce the same single transfer utterance while receiving unaltered auditory feedback. We observed limited transfer of learning, which depended on the acoustical similarity between the training and the transfer utterances. The gradients of generalization observed here are comparable to those observed in limb movement. The present findings are consistent with the conclusion that speech learning remains specific to individual instances of learning.
Nonhomogeneous transfer reveals specificity in speech motor learning
Rochet-Capellan, Amélie; Richer, Lara
2012-01-01
Does motor learning generalize to new situations that are not experienced during training, or is motor learning essentially specific to the training situation? In the present experiments, we use speech production as a model to investigate generalization in motor learning. We tested for generalization from training to transfer utterances by varying the acoustical similarity between these two sets of utterances. During the training phase of the experiment, subjects received auditory feedback that was altered in real time as they repeated a single consonant-vowel-consonant utterance. Different groups of subjects were trained with different consonant-vowel-consonant utterances, which differed from a subsequent transfer utterance in terms of the initial consonant or vowel. During the adaptation phase of the experiment, we observed that subjects in all groups progressively changed their speech output to compensate for the perturbation (altered auditory feedback). After learning, we tested for generalization by having all subjects produce the same single transfer utterance while receiving unaltered auditory feedback. We observed limited transfer of learning, which depended on the acoustical similarity between the training and the transfer utterances. The gradients of generalization observed here are comparable to those observed in limb movement. The present findings are consistent with the conclusion that speech learning remains specific to individual instances of learning. PMID:22190628
Halting in Single Word Production: A Test of the Perceptual Loop Theory of Speech Monitoring
ERIC Educational Resources Information Center
Slevc, L. Robert; Ferreira, Victor S.
2006-01-01
The "perceptual loop theory" of speech monitoring (Levelt, 1983) claims that inner and overt speech are monitored by the comprehension system, which detects errors by comparing the comprehension of formulated utterances to originally intended utterances. To test the perceptual loop monitor, speakers named pictures and sometimes attempted to halt…
Imitative Production of Rising Speech Intonation in Pediatric Cochlear Implant Recipients
Peng, Shu-Chen; Tomblin, J. Bruce; Spencer, Linda J.; Hurtig, Richard R.
2011-01-01
Purpose This study investigated the acoustic characteristics of pediatric cochlear implant (CI) recipients' imitative production of rising speech intonation, in relation to the perceptual judgments by listeners with normal hearing (NH). Method Recordings of a yes–no interrogative utterance imitated by 24 prelingually deafened children with a CI were extracted from annual evaluation sessions. These utterances were perceptually judged by adult NH listeners in regard with intonation contour type (non-rise, partial-rise, or full-rise) and contour appropriateness (on a 5-point scale). Fundamental frequency, intensity, and duration properties of each utterance were also acoustically analyzed. Results Adult NH listeners' judgments of intonation contour type and contour appropriateness for each CI participant 's utterances were highly positively correlated. The pediatric CI recipients did not consistently use appropriate intonation contours when imitating a yes–no question. Acoustic properties of speech intonation produced by these individuals were discernible among utterances of different intonation contour types according to NH listeners' perceptual judgments. Conclusions These findings delineated the perceptual and acoustic characteristics of speech intonation imitated by prelingually deafened children and young adults with a CI. Future studies should address whether the degraded signals these individuals perceive via a CI contribute to their difficulties with speech intonation production. PMID:17905907
Nakai, Yasushi; Takiguchi, Tetsuya; Matsui, Gakuyo; Yamaoka, Noriko; Takada, Satoshi
2017-10-01
Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.
Gygi, Brian; Shafiro, Valeriy
2014-04-01
Speech perception in multitalker environments often requires listeners to divide attention among several concurrent talkers before focusing on one talker with pertinent information. Such attentionally demanding tasks are particularly difficult for older adults due both to age-related hearing loss (presbacusis) and general declines in attentional processing and associated cognitive abilities. This study investigated two signal-processing techniques that have been suggested as a means of improving speech perception accuracy of older adults: time stretching and spatial separation of target talkers. Stimuli in each experiment comprised 2-4 fixed-form utterances in which listeners were asked to consecutively 1) detect concurrently spoken keywords in the beginning of the utterance (divided attention); and, 2) identify additional keywords from only one talker at the end of the utterance (selective attention). In Experiment 1, the overall tempo of each utterance was unaltered or slowed down by 25%; in Experiment 2 the concurrent utterances were spatially coincident or separated across a 180-degree hemifield. Both manipulations improved performance for elderly adults with age-appropriate hearing on both tasks. Increasing the divided attention load by attending to more concurrent keywords had a marked negative effect on performance of the selective attention task only when the target talker was identified by a keyword, but not by spatial location. These findings suggest that the temporal and spatial modifications of multitalker speech improved perception of multitalker speech primarily by reducing competition among cognitive resources required to perform attentionally demanding tasks. Published by Elsevier B.V.
Auditory Training with Frequent Communication Partners
ERIC Educational Resources Information Center
Tye-Murray, Nancy; Spehar, Brent; Sommers, Mitchell; Barcroft, Joe
2016-01-01
Purpose: Individuals with hearing loss engage in auditory training to improve their speech recognition. They typically practice listening to utterances spoken by unfamiliar talkers but never to utterances spoken by their most frequent communication partner (FCP)--speech they most likely desire to recognize--under the assumption that familiarity…
Phonologically Driven Variability: The Case of Determiners
ERIC Educational Resources Information Center
Bürki, Audrey; Laganaro, Marina; Alario, F.-Xavier
2014-01-01
Speakers usually produce words in connected speech. In such contexts, the form in which many words are uttered is influenced by the phonological properties of neighboring words. The current article examines the representations and processes underlying the production of phonologically constrained word form variations. For this purpose, we consider…
Räsänen, Okko; Kakouros, Sofoklis; Soderstrom, Melanie
2018-06-06
The exaggerated intonation and special rhythmic properties of infant-directed speech (IDS) have been hypothesized to attract infants' attention to the speech stream. However, there has been little work actually connecting the properties of IDS to models of attentional processing or perceptual learning. A number of such attention models suggest that surprising or novel perceptual inputs attract attention, where novelty can be operationalized as the statistical (un)predictability of the stimulus in the given context. Since prosodic patterns such as F0 contours are accessible to young infants who are also known to be adept statistical learners, the present paper investigates a hypothesis that F0 contours in IDS are less predictable than those in adult-directed speech (ADS), given previous exposure to both speaking styles, thereby potentially tapping into basic attentional mechanisms of the listeners in a similar manner that relative probabilities of other linguistic patterns are known to modulate attentional processing in infants and adults. Computational modeling analyses with naturalistic IDS and ADS speech from matched speakers and contexts show that IDS intonation has lower overall temporal predictability even when the F0 contours of both speaking styles are normalized to have equal means and variances. A closer analysis reveals that there is a tendency of IDS intonation to be less predictable at the end of short utterances, whereas ADS exhibits more stable average predictability patterns across the full extent of the utterances. The difference between IDS and ADS persists even when the proportion of IDS and ADS exposure is varied substantially, simulating different relative amounts of IDS heard in different family and cultural environments. Exposure to IDS is also found to be more efficient for predicting ADS intonation contours in new utterances than exposure to the equal amount of ADS speech. This indicates that the more variable prosodic contours of IDS also generalize to ADS, and may therefore enhance prosodic learning in infancy. Overall, the study suggests that one reason behind infant preference for IDS could be its higher information value at the prosodic level, as measured by the amount of surprisal in the F0 contours. This provides the first formal link between the properties of IDS and the models of attentional processing and statistical learning in the brain. However, this finding does not rule out the possibility that other differences between the IDS and ADS also play a role. Copyright © 2018 Elsevier B.V. All rights reserved.
Using the self-select paradigm to delineate the nature of speech motor programming.
Wright, David L; Robin, Don A; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H; Fox, Peter T
2009-06-01
The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial order demands of longer sequences. A modified reaction time paradigm was used to assess INT and SEQ demands. Specifically, syllable complexity was dependent on syllable structure, whereas sequence complexity involved either repeated or unique syllabi within an utterance. INT execution was slowed when articulating single syllables in the form CCCV compared to simpler CV syllables. Planning unique syllables within a multisyllabic utterance rather than repetitions of the same syllable slowed INT but not SEQ. The INT speech motor programming process, important for mental syllabary access, is sensitive to changes in both syllable structure and the number of unique syllables in an utterance.
From In-Session Behaviors to Drinking Outcomes: A Causal Chain for Motivational Interviewing
ERIC Educational Resources Information Center
Moyers, Theresa B.; Martin, Tim; Houck, Jon M.; Christopher, Paulette J.; Tonigan, J. Scott
2009-01-01
Client speech in favor of change within motivational interviewing sessions has been linked to treatment outcomes, but a causal chain has not yet been demonstrated. Using a sequential behavioral coding system for client speech, the authors found that, at both the session and utterance levels, specific therapist behaviors predict client change talk.…
Expressive Language during Conversational Speech in Boys with Fragile X Syndrome
ERIC Educational Resources Information Center
Roberts, Joanne E.; Hennon, Elizabeth A.; Price, Johanna R.; Dear, Elizabeth; Anderson, Kathleen; Vandergrift, Nathan A.
2007-01-01
We compared the expressive syntax and vocabulary skills of 35 boys with fragile X syndrome and 27 younger typically developing boys who were at similar nonverbal mental levels. During a conversational speech sample, the boys with fragile X syndrome used shorter, less complex utterances and produced fewer different words than did the typically…
ERIC Educational Resources Information Center
Alrusayni, Norah
2017-01-01
This study was conducted to determine the effectiveness of using the high-tech speech-generating device with Proloquo2Go app to reduce echolalic utterances in a student with autism during conversational speech. After observing that the iPad device with several apps was used by the students and that it served as a communication device, language…
Speech Compensation for Time-Scale-Modified Auditory Feedback
ERIC Educational Resources Information Center
Ogane, Rintaro; Honda, Masaaki
2014-01-01
Purpose: The purpose of this study was to examine speech compensation in response to time-scale-modified auditory feedback during the transition of the semivowel for a target utterance of /ija/. Method: Each utterance session consisted of 10 control trials in the normal feedback condition followed by 20 perturbed trials in the modified auditory…
Reference in Action: Links between Pointing and Language
ERIC Educational Resources Information Center
Cooperrider, Kensy Andrew
2011-01-01
When referring to things in the world, speakers produce utterances that are composites of speech and action. Pointing gestures are a pervasive part of such composite utterances, but many questions remain about exactly how pointing is integrated with speech. In this dissertation I present three strands of research that investigate relations of…
The Role of Utterance Length and Position in 3-Year-Olds' Production of Third Person Singular -s
ERIC Educational Resources Information Center
Mealings, Kiri T.; Demuth, Katherine
2014-01-01
Purpose: Evidence from children's spontaneous speech suggests that utterance length and utterance position may help explain why children omit grammatical morphemes in some contexts but not others. This study investigated whether increased utterance length (hence, increased grammatical complexity) adversely affects children's third person singular…
Speech production in experienced cochlear implant users undergoing short-term auditory deprivation
NASA Astrophysics Data System (ADS)
Greenman, Geoffrey; Tjaden, Kris; Kozak, Alexa T.
2005-09-01
This study examined the effect of short-term auditory deprivation on the speech production of five postlingually deafened women, all of whom were experienced cochlear implant users. Each cochlear implant user, as well as age and gender matched control speakers, produced CVC target words embedded in a reading passage. Speech samples for the deafened adults were collected on two separate occasions. First, the speakers were recorded after wearing their speech processor consistently for at least two to three hours prior to recording (implant ``ON''). The second recording occurred when the speakers had their speech processors turned off for approximately ten to twelve hours prior to recording (implant ``OFF''). Acoustic measures, including fundamental frequency (F0), the first (F1) and second (F2) formants of the vowels, vowel space area, vowel duration, spectral moments of the consonants, as well as utterance duration and sound pressure level (SPL) across the entire utterance were analyzed in both speaking conditions. For each implant speaker, acoustic measures will be compared across implant ``ON'' and implant ``OFF'' speaking conditions, and will also be compared to data obtained from normal hearing speakers.
Perceptual Learning of Speech under Optimal and Adverse Conditions
Zhang, Xujin; Samuel, Arthur G.
2014-01-01
Humans have a remarkable ability to understand spoken language despite the large amount of variability in speech. Previous research has shown that listeners can use lexical information to guide their interpretation of atypical sounds in speech (Norris, McQueen, & Cutler, 2003). This kind of lexically induced perceptual learning enables people to adjust to the variations in utterances due to talker-specific characteristics, such as individual identity and dialect. The current study investigated perceptual learning in two optimal conditions: conversational speech (Experiment 1) vs. clear speech (Experiment 2), and three adverse conditions: noise (Experiment 3a) vs. two cognitive loads (Experiments 4a & 4b). Perceptual learning occurred in the two optimal conditions and in the two cognitive load conditions, but not in the noise condition. Furthermore, perceptual learning occurred only in the first of two sessions for each participant, and only for atypical /s/ sounds and not for atypical /f/ sounds. This pattern of learning and non-learning reflects a balance between flexibility and stability that the speech system must have to deal with speech variability in the diverse conditions that speech is encountered. PMID:23815478
Infants' Behaviors as Antecedents and Consequents of Mothers' Responsive and Directive Utterances
ERIC Educational Resources Information Center
Masur, Elise Frank; Flynn, Valerie; Lloyd, Carrie A.
2013-01-01
To investigate possible influences on and consequences of mothers' speech, specific infant behaviors preceding and following four pragmatic categories of mothers' utterances--responsive utterances, supportive behavioral directives, intrusive behavioral directives, and intrusive attentional directives--were examined longitudinally during dyadic…
Accent, intelligibility, and comprehensibility in the perception of foreign-accented Lombard speech
NASA Astrophysics Data System (ADS)
Li, Chi-Nin
2003-10-01
Speech produced in noise (Lombard speech) has been reported to be more intelligible than speech produced in quiet (normal speech). This study examined the perception of non-native Lombard speech in terms of intelligibility, comprehensibility, and degree of foreign accent. Twelve Cantonese speakers and a comparison group of English speakers read simple true and false English statements in quiet and in 70 dB of masking noise. Lombard and normal utterances were mixed with noise at a constant signal-to-noise ratio, and presented along with noise-free stimuli to eight new English listeners who provided transcription scores, comprehensibility ratings, and accent ratings. Analyses showed that, as expected, utterances presented in noise were less well perceived than were noise-free sentences, and that the Cantonese speakers' productions were more accented, but less intelligible and less comprehensible than those of the English speakers. For both groups of speakers, the Lombard sentences were correctly transcribed more often than their normal utterances in noisy conditions. However, the Cantonese-accented Lombard sentences were not rated as easier to understand than was the normal speech in all conditions. The assigned accent ratings were similar throughout all listening conditions. Implications of these findings will be discussed.
Using the Self-Select Paradigm to Delineate the Nature of Speech Motor Programming
Wright, David L.; Robin, Don A.; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H.; Fox, Peter T.
2015-01-01
Purpose The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial order demands of longer sequences. Method A modified reaction time paradigm was used to assess INT and SEQ demands. Specifically, syllable complexity was dependent on syllable structure, whereas sequence complexity involved either repeated or unique syllabi within an utterance. Results INT execution was slowed when articulating single syllables in the form CCCV compared to simpler CV syllables. Planning unique syllables within a multisyllabic utterance rather than repetitions of the same syllable slowed INT but not SEQ. Conclusions The INT speech motor programming process, important for mental syllabary access, is sensitive to changes in both syllable structure and the number of unique syllables in an utterance. PMID:19474396
Developmental change in variability of lip muscle activity during speech.
Wohlert, Amy B; Smith, Anne
2002-12-01
Compared to adults, children's speech production measures sometimes show higher trial-to-trial variability in both kinematic and acoustic analyses. A reasonable hypothesis is that this variability reflects variations in neural drive to muscles as the developing system explores different solutions to achieving vocal tract goals. We investigated that hypothesis in the present study by analyzing EMG waveforms produced across repetitions of a phrase spoken by 7-year-olds, 12-year-olds, and young adults. The EMG waveforms recorded via surface electrodes at upper lip sites were clearly modulated in a consistent manner corresponding to lip closure for the bilabial consonants in the utterance. Thus we were able to analyze the amplitude envelope of the rectified EMG with a phrase-level variability index previously used with kinematic data. Both the 7- and 12-year-old children were significantly more variable on repeated productions than the young adults. These results support the idea that children are using varying combinations of muscle activity to achieve phonetic goals. Even at age 12 years, these children were not adult-like in their performance. These and earlier kinematic studies of the oral motor system suggest that children retain their flexibility, employing more degrees of freedom than adults, to dynamically control lip aperture during speech. This strategy is adaptive given the many neurophysiological and biomechanical changes that occur during the transition from adolescence to adulthood.
The Speech Functions Used by Ibu Muslimah and Pak Harfan in "Laskar Pelangi" Drama
ERIC Educational Resources Information Center
Rahmania
2018-01-01
This research investigates kind of speech function used in utterances created by Ibu Muslimah and Pak Harfan in "Laskar Pelangi" drama. Descriptive qualitative method is conduct for this research. The script of drama Laskar Pelangi was taken as source of data. All utterances produced by bu Muslimah and Pak Harfan as the data in the…
ERIC Educational Resources Information Center
Richardson, Tanya; Murray, Jane
2017-01-01
Within English early childhood education, there is emphasis on improving speech and language development as well as a drive for outdoor learning. This paper synthesises both aspects to consider whether or not links exist between the environment and the quality of young children's utterances as part of their speech and language development and if…
Effects of Conversational Pressures on Speech Planning
ERIC Educational Resources Information Center
Swets, Benjamin; Jacovina, Matthew E.; Gerrig, Richard J.
2013-01-01
In ordinary conversation, speakers experience pressures both to produce utterances suited to particular addressees and to do so with minimal delay. To document the impact of these conversational pressures, our experiment asked participants to produce brief utterances to describe visual displays. We complicated utterance planning by including…
Recognizing intentions in infant-directed speech: evidence for universals.
Bryant, Gregory A; Barrett, H Clark
2007-08-01
In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfort, and attention) as both ID and adult-directed (AD) speech, and we then presented the utterances to Shuar adults (South American hunter-horticulturalists). Shuar subjects were able to reliably distinguish ID from AD speech and were able to reliably recognize the intention categories in both types of speech, although performance was significantly better with ID speech. This is the first demonstration that adult listeners in an indigenous, nonindustrialized, and nonliterate culture can accurately infer intentions from both ID speech and AD speech in a language they do not speak.
Lai, Ying-Hui; Tsao, Yu; Lu, Xugang; Chen, Fei; Su, Yu-Ting; Chen, Kuang-Chao; Chen, Yu-Hsuan; Chen, Li-Ching; Po-Hung Li, Lieber; Lee, Chin-Hui
2018-01-20
We investigate the clinical effectiveness of a novel deep learning-based noise reduction (NR) approach under noisy conditions with challenging noise types at low signal to noise ratio (SNR) levels for Mandarin-speaking cochlear implant (CI) recipients. The deep learning-based NR approach used in this study consists of two modules: noise classifier (NC) and deep denoising autoencoder (DDAE), thus termed (NC + DDAE). In a series of comprehensive experiments, we conduct qualitative and quantitative analyses on the NC module and the overall NC + DDAE approach. Moreover, we evaluate the speech recognition performance of the NC + DDAE NR and classical single-microphone NR approaches for Mandarin-speaking CI recipients under different noisy conditions. The testing set contains Mandarin sentences corrupted by two types of maskers, two-talker babble noise, and a construction jackhammer noise, at 0 and 5 dB SNR levels. Two conventional NR techniques and the proposed deep learning-based approach are used to process the noisy utterances. We qualitatively compare the NR approaches by the amplitude envelope and spectrogram plots of the processed utterances. Quantitative objective measures include (1) normalized covariance measure to test the intelligibility of the utterances processed by each of the NR approaches; and (2) speech recognition tests conducted by nine Mandarin-speaking CI recipients. These nine CI recipients use their own clinical speech processors during testing. The experimental results of objective evaluation and listening test indicate that under challenging listening conditions, the proposed NC + DDAE NR approach yields higher intelligibility scores than the two compared classical NR techniques, under both matched and mismatched training-testing conditions. When compared to the two well-known conventional NR techniques under challenging listening condition, the proposed NC + DDAE NR approach has superior noise suppression capabilities and gives less distortion for the key speech envelope information, thus, improving speech recognition more effectively for Mandarin CI recipients. The results suggest that the proposed deep learning-based NR approach can potentially be integrated into existing CI signal processors to overcome the degradation of speech perception caused by noise.
Neural dynamics of speech act comprehension: an MEG study of naming and requesting.
Egorova, Natalia; Pulvermüller, Friedemann; Shtyrov, Yury
2014-05-01
The neurobiological basis and temporal dynamics of communicative language processing pose important yet unresolved questions. It has previously been suggested that comprehension of the communicative function of an utterance, i.e. the so-called speech act, is supported by an ensemble of neural networks, comprising lexico-semantic, action and mirror neuron as well as theory of mind circuits, all activated in concert. It has also been demonstrated that recognition of the speech act type occurs extremely rapidly. These findings however, were obtained in experiments with insufficient spatio-temporal resolution, thus possibly concealing important facets of the neural dynamics of the speech act comprehension process. Here, we used magnetoencephalography to investigate the comprehension of Naming and Request actions performed with utterances controlled for physical features, psycholinguistic properties and the probability of occurrence in variable contexts. The results show that different communicative actions are underpinned by a dynamic neural network, which differentiates between speech act types very early after the speech act onset. Within 50-90 ms, Requests engaged mirror-neuron action-comprehension systems in sensorimotor cortex, possibly for processing action knowledge and intentions. Still, within the first 200 ms of stimulus onset (100-150 ms), Naming activated brain areas involved in referential semantic retrieval. Subsequently (200-300 ms), theory of mind and mentalising circuits were activated in medial prefrontal and temporo-parietal areas, possibly indexing processing of intentions and assumptions of both communication partners. This cascade of stages of processing information about actions and intentions, referential semantics, and theory of mind may underlie dynamic and interactive speech act comprehension.
Hierarchical organization in the temporal structure of infant-direct speech and song.
Falk, Simone; Kello, Christopher T
2017-06-01
Caregivers alter the temporal structure of their utterances when talking and singing to infants compared with adult communication. The present study tested whether temporal variability in infant-directed registers serves to emphasize the hierarchical temporal structure of speech. Fifteen German-speaking mothers sang a play song and told a story to their 6-months-old infants, or to an adult. Recordings were analyzed using a recently developed method that determines the degree of nested clustering of temporal events in speech. Events were defined as peaks in the amplitude envelope, and clusters of various sizes related to periods of acoustic speech energy at varying timescales. Infant-directed speech and song clearly showed greater event clustering compared with adult-directed registers, at multiple timescales of hundreds of milliseconds to tens of seconds. We discuss the relation of this newly discovered acoustic property to temporal variability in linguistic units and its potential implications for parent-infant communication and infants learning the hierarchical structures of speech and language. Copyright © 2017 Elsevier B.V. All rights reserved.
Perceptual learning of speech under optimal and adverse conditions.
Zhang, Xujin; Samuel, Arthur G
2014-02-01
Humans have a remarkable ability to understand spoken language despite the large amount of variability in speech. Previous research has shown that listeners can use lexical information to guide their interpretation of atypical sounds in speech (Norris, McQueen, & Cutler, 2003). This kind of lexically induced perceptual learning enables people to adjust to the variations in utterances due to talker-specific characteristics, such as individual identity and dialect. The current study investigated perceptual learning in two optimal conditions: conversational speech (Experiment 1) versus clear speech (Experiment 2), and three adverse conditions: noise (Experiment 3a) versus two cognitive loads (Experiments 4a and 4b). Perceptual learning occurred in the two optimal conditions and in the two cognitive load conditions, but not in the noise condition. Furthermore, perceptual learning occurred only in the first of two sessions for each participant, and only for atypical /s/ sounds and not for atypical /f/ sounds. This pattern of learning and nonlearning reflects a balance between flexibility and stability that the speech system must have to deal with speech variability in the diverse conditions that speech is encountered. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Sound representation in higher language areas during language generation
Magrassi, Lorenzo; Aromataris, Giuseppe; Cabrini, Alessandro; Annovazzi-Lodi, Valerio; Moro, Andrea
2015-01-01
How language is encoded by neural activity in the higher-level language areas of humans is still largely unknown. We investigated whether the electrophysiological activity of Broca’s area correlates with the sound of the utterances produced. During speech perception, the electric cortical activity of the auditory areas correlates with the sound envelope of the utterances. In our experiment, we compared the electrocorticogram recorded during awake neurosurgical operations in Broca’s area and in the dominant temporal lobe with the sound envelope of single words versus sentences read aloud or mentally by the patients. Our results indicate that the electrocorticogram correlates with the sound envelope of the utterances, starting before any sound is produced and even in the absence of speech, when the patient is reading mentally. No correlations were found when the electrocorticogram was recorded in the superior parietal gyrus, an area not directly involved in language generation, or in Broca’s area when the participants were executing a repetitive motor task, which did not include any linguistic content, with their dominant hand. The distribution of suprathreshold correlations across frequencies of cortical activities varied whether the sound envelope derived from words or sentences. Our results suggest the activity of language areas is organized by sound when language is generated before any utterance is produced or heard. PMID:25624479
Emotions in freely varying and mono-pitched vowels, acoustic and EGG analyses.
Waaramaa, Teija; Palo, Pertti; Kankare, Elina
2015-12-01
Vocal emotions are expressed either by speech or singing. The difference is that in singing the pitch is predetermined while in speech it may vary freely. It was of interest to study whether there were voice quality differences between freely varying and mono-pitched vowels expressed by professional actors. Given their profession, actors have to be able to express emotions both by speech and singing. Electroglottogram and acoustic analyses of emotional utterances embedded in expressions of freely varying vowels [a:], [i:], [u:] (96 samples) and mono-pitched protracted vowels (96 samples) were studied. Contact quotient (CQEGG) was calculated using 35%, 55%, and 80% threshold levels. Three different threshold levels were used in order to evaluate their effects on emotions. Genders were studied separately. The results suggested significant gender differences for CQEGG 80% threshold level. SPL, CQEGG, and F4 were used to convey emotions, but to a lesser degree, when F0 was predetermined. Moreover, females showed fewer significant variations than males. Both genders used more hypofunctional phonation type in mono-pitched utterances than in the expressions with freely varying pitch. The present material warrants further study of the interplay between CQEGG threshold levels and formant frequencies, and listening tests to investigate the perceptual value of the mono-pitched vowels in the communication of emotions.
2014-01-01
The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined. Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production. PMID:25060583
Zourmand, Alireza; Mirhassani, Seyed Mostafa; Ting, Hua-Nong; Bux, Shaik Ismail; Ng, Kwan Hoong; Bilgen, Mehmet; Jalaludin, Mohd Amin
2014-07-25
The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined.Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production.
Visual Grouping in Accordance With Utterance Planning Facilitates Speech Production.
Zhao, Liming; Paterson, Kevin B; Bai, Xuejun
2018-01-01
Research on language production has focused on the process of utterance planning and involved studying the synchronization between visual gaze and the production of sentences that refer to objects in the immediate visual environment. However, it remains unclear how the visual grouping of these objects might influence this process. To shed light on this issue, the present research examined the effects of the visual grouping of objects in a visual display on utterance planning in two experiments. Participants produced utterances of the form "The snail and the necklace are above/below/on the left/right side of the toothbrush" for objects containing these referents (e.g., a snail, a necklace and a toothbrush). These objects were grouped using classic Gestalt principles of color similarity (Experiment 1) and common region (Experiment 2) so that the induced perceptual grouping was congruent or incongruent with the required phrasal organization. The results showed that speech onset latencies were shorter in congruent than incongruent conditions. The findings therefore reveal that the congruency between the visual grouping of referents and the required phrasal organization can influence speech production. Such findings suggest that, when language is produced in a visual context, speakers make use of both visual and linguistic cues to plan utterances.
Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.
Meyer, Bernd T; Brand, Thomas; Kollmeier, Birger
2011-01-01
The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems.
Alternative Speech Communication System for Persons with Severe Speech Disorders
NASA Astrophysics Data System (ADS)
Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas
2009-12-01
Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
Bone, Daniel; Li, Ming; Black, Matthew P.; Narayanan, Shrikanth S.
2013-01-01
Segmental and suprasegmental speech signal modulations offer information about paralinguistic content such as affect, age and gender, pathology, and speaker state. Speaker state encompasses medium-term, temporary physiological phenomena influenced by internal or external biochemical actions (e.g., sleepiness, alcohol intoxication). Perceptual and computational research indicates that detecting speaker state from speech is a challenging task. In this paper, we present a system constructed with multiple representations of prosodic and spectral features that provided the best result at the Intoxication Subchallenge of Interspeech 2011 on the Alcohol Language Corpus. We discuss the details of each classifier and show that fusion improves performance. We additionally address the question of how best to construct a speaker state detection system in terms of robust and practical marginalization of associated variability such as through modeling speakers, utterance type, gender, and utterance length. As is the case in human perception, speaker normalization provides significant improvements to our system. We show that a held-out set of baseline (sober) data can be used to achieve comparable gains to other speaker normalization techniques. Our fused frame-level statistic-functional systems, fused GMM systems, and final combined system achieve unweighted average recalls (UARs) of 69.7%, 65.1%, and 68.8%, respectively, on the test set. More consistent numbers compared to development set results occur with matched-prompt training, where the UARs are 70.4%, 66.2%, and 71.4%, respectively. The combined system improves over the Challenge baseline by 5.5% absolute (8.4% relative), also improving upon our previously best result. PMID:24376305
NASA Astrophysics Data System (ADS)
Yellen, H. W.
1983-03-01
Literature pertaining to Voice Recognition abounds with information relevant to the assessment of transitory speech recognition devices. In the past, engineering requirements have dictated the path this technology followed. But, other factors do exist that influence recognition accuracy. This thesis explores the impact of Human Factors on the successful recognition of speech, principally addressing the differences or variability among users. A Threshold Technology T-600 was used for a 100 utterance vocubalary to test 44 subjects. A statistical analysis was conducted on 5 generic categories of Human Factors: Occupational, Operational, Psychological, Physiological and Personal. How the equipment is trained and the experience level of the speaker were found to be key characteristics influencing recognition accuracy. To a lesser extent computer experience, time or week, accent, vital capacity and rate of air flow, speaker cooperativeness and anxiety were found to affect overall error rates.
Children's perception of their synthetically corrected speech production.
Strömbergsson, Sofia; Wengelin, Asa; House, David
2014-06-01
We explore children's perception of their own speech - in its online form, in its recorded form, and in synthetically modified forms. Children with phonological disorder (PD) and children with typical speech and language development (TD) performed tasks of evaluating accuracy of the different types of speech stimuli, either immediately after having produced the utterance or after a delay. In addition, they performed a task designed to assess their ability to detect synthetic modification. Both groups showed high performance in tasks involving evaluation of other children's speech, whereas in tasks of evaluating one's own speech, the children with PD were less accurate than their TD peers. The children with PD were less sensitive to misproductions in immediate conjunction with their production of an utterance, and more accurate after a delay. Within-category modification often passed undetected, indicating a satisfactory quality of the generated speech. Potential clinical benefits of using corrective re-synthesis are discussed.
ERIC Educational Resources Information Center
Theodore, Rachel M.; Demuth, Katherine; Shattuck-Hufnagel, Stefanie
2015-01-01
Purpose: Prosodic and articulatory factors influence children's production of inflectional morphemes. For example, plural -"s" is produced more reliably in utterance-final compared to utterance-medial position (i.e., the positional effect), which has been attributed to the increased planning time in utterance-final position. In previous…
Measuring Speech Comprehensibility in Students with Down Syndrome
Woynaroski, Tiffany; Camarata, Stephen
2016-01-01
Purpose There is an ongoing need to develop assessments of spontaneous speech that focus on whether the child's utterances are comprehensible to listeners. This study sought to identify the attributes of a stable ratings-based measure of speech comprehensibility, which enabled examining the criterion-related validity of an orthography-based measure of the comprehensibility of conversational speech in students with Down syndrome. Method Participants were 10 elementary school students with Down syndrome and 4 unfamiliar adult raters. Averaged across-observer Likert ratings of speech comprehensibility were called a ratings-based measure of speech comprehensibility. The proportion of utterance attempts fully glossed constituted an orthography-based measure of speech comprehensibility. Results Averaging across 4 raters on four 5-min segments produced a reliable (G = .83) ratings-based measure of speech comprehensibility. The ratings-based measure was strongly (r > .80) correlated with the orthography-based measure for both the same and different conversational samples. Conclusion Reliable and valid measures of speech comprehensibility are achievable with the resources available to many researchers and some clinicians. PMID:27299989
Lancia, Leonardo; Fuchs, Susanne; Tiede, Mark
2014-06-01
The aim of this article was to introduce an important tool, cross-recurrence analysis, to speech production applications by showing how it can be adapted to evaluate the similarity of multivariate patterns of articulatory motion. The method differs from classical applications of cross-recurrence analysis because no phase space reconstruction is conducted, and a cleaning algorithm removes the artifacts from the recurrence plot. The main features of the proposed approach are robustness to nonstationarity and efficient separation of amplitude variability from temporal variability. The authors tested these claims by applying their method to synthetic stimuli whose variability had been carefully controlled. The proposed method was also demonstrated in a practical application: It was used to investigate the role of biomechanical constraints in articulatory reorganization as a consequence of speeded repetition of CVCV utterances containing a labial and a coronal consonant. Overall, the proposed approach provided more reliable results than other methods, particularly in the presence of high variability. The proposed method is a useful and appropriate tool for quantifying similarity and dissimilarity in patterns of speech articulator movement, especially in such research areas as speech errors and pathologies, where unpredictable divergent behavior is expected.
Effects of ethanol intoxication on speech suprasegmentals
NASA Astrophysics Data System (ADS)
Hollien, Harry; Dejong, Gea; Martin, Camilo A.; Schwartz, Reva; Liljegren, Kristen
2001-12-01
The effects of ingesting ethanol have been shown to be somewhat variable in humans. To date, there appear to be but few universals. Yet, the question often arises: is it possible to determine if a person is intoxicated by observing them in some manner? A closely related question is: can speech be used for this purpose and, if so, can the degree of intoxication be determined? One of the many issues associated with these questions involves the relationships between a person's paralinguistic characteristics and the presence and level of inebriation. To this end, young, healthy speakers of both sexes were carefully selected and sorted into roughly equal groups of light, moderate, and heavy drinkers. They were asked to produce four types of utterances during a learning phase, when sober and at four strictly controlled levels of intoxication (three ascending and one descending). The primary motor speech measures employed were speaking fundamental frequency, speech intensity, speaking rate and nonfluencies. Several statistically significant changes were found for increasing intoxication; the primary ones included rises in F0, in task duration and for nonfluencies. Minor gender differences were found but they lacked statistical significance. So did the small differences among the drinking category subgroups and the subject groupings related to levels of perceived intoxication. Finally, although it may be concluded that certain changes in speech suprasegmentals will occur as a function of increasing intoxication, these patterns cannot be viewed as universal since a few subjects (about 20%) exhibited no (or negative) changes.
Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel
2015-01-01
Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.
Long-term temporal tracking of speech rate affects spoken-word recognition.
Baese-Berk, Melissa M; Heffner, Christopher C; Dilley, Laura C; Pitt, Mark A; Morrill, Tuuli H; McAuley, J Devin
2014-08-01
Humans unconsciously track a wide array of distributional characteristics in their sensory environment. Recent research in spoken-language processing has demonstrated that the speech rate surrounding a target region within an utterance influences which words, and how many words, listeners hear later in that utterance. On the basis of hypotheses that listeners track timing information in speech over long timescales, we investigated the possibility that the perception of words is sensitive to speech rate over such a timescale (e.g., an extended conversation). Results demonstrated that listeners tracked variation in the overall pace of speech over an extended duration (analogous to that of a conversation that listeners might have outside the lab) and that this global speech rate influenced which words listeners reported hearing. The effects of speech rate became stronger over time. Our findings are consistent with the hypothesis that neural entrainment by speech occurs on multiple timescales, some lasting more than an hour. © The Author(s) 2014.
Emotion to emotion speech conversion in phoneme level
NASA Astrophysics Data System (ADS)
Bulut, Murtaza; Yildirim, Serdar; Busso, Carlos; Lee, Chul Min; Kazemzadeh, Ebrahim; Lee, Sungbok; Narayanan, Shrikanth
2004-10-01
Having an ability to synthesize emotional speech can make human-machine interaction more natural in spoken dialogue management. This study investigates the effectiveness of prosodic and spectral modification in phoneme level on emotion-to-emotion speech conversion. The prosody modification is performed with the TD-PSOLA algorithm (Moulines and Charpentier, 1990). We also transform the spectral envelopes of source phonemes to match those of target phonemes using LPC-based spectral transformation approach (Kain, 2001). Prosodic speech parameters (F0, duration, and energy) for target phonemes are estimated from the statistics obtained from the analysis of an emotional speech database of happy, angry, sad, and neutral utterances collected from actors. Listening experiments conducted with native American English speakers indicate that the modification of prosody only or spectrum only is not sufficient to elicit targeted emotions. The simultaneous modification of both prosody and spectrum results in higher acceptance rates of target emotions, suggesting that not only modeling speech prosody but also modeling spectral patterns that reflect underlying speech articulations are equally important to synthesize emotional speech with good quality. We are investigating suprasegmental level modifications for further improvement in speech quality and expressiveness.
Clustered functional MRI of overt speech production.
Sörös, Peter; Sokoloff, Lisa Guttman; Bose, Arpita; McIntosh, Anthony R; Graham, Simon J; Stuss, Donald T
2006-08-01
To investigate the neural network of overt speech production, event-related fMRI was performed in 9 young healthy adult volunteers. A clustered image acquisition technique was chosen to minimize speech-related movement artifacts. Functional images were acquired during the production of oral movements and of speech of increasing complexity (isolated vowel as well as monosyllabic and trisyllabic utterances). This imaging technique and behavioral task enabled depiction of the articulo-phonologic network of speech production from the supplementary motor area at the cranial end to the red nucleus at the caudal end. Speaking a single vowel and performing simple oral movements involved very similar activation of the cortical and subcortical motor systems. More complex, polysyllabic utterances were associated with additional activation in the bilateral cerebellum, reflecting increased demand on speech motor control, and additional activation in the bilateral temporal cortex, reflecting the stronger involvement of phonologic processing.
A characterization of verb use in Turkish agrammatic narrative speech.
Arslan, Seçkin; Bamyacı, Elif; Bastiaanse, Roelien
2016-01-01
This study investigates the characteristics of narrative-speech production and the use of verbs in Turkish agrammatic speakers (n = 10) compared to non-brain-damaged controls (n = 10). To elicit narrative-speech samples, personal interviews and storytelling tasks were conducted. Turkish has a large and regular verb inflection paradigm where verbs are inflected for evidentiality (i.e. direct versus indirect evidence available to the speaker). Particularly, we explored the general characteristics of the speech samples (e.g. utterance length) and the uses of lexical, finite and non-finite verbs and direct and indirect evidentials. The results show that speech rate is slow, verbs per utterance are lower than normal and the verb diversity is reduced in the agrammatic speakers. Verb inflection is relatively intact; however, a trade-off pattern between inflection for direct evidentials and verb diversity is found. The implications of the data are discussed in connection with narrative-speech production studies on other languages.
Open Microphone Speech Understanding: Correct Discrimination Of In Domain Speech
NASA Technical Reports Server (NTRS)
Hieronymus, James; Aist, Greg; Dowding, John
2006-01-01
An ideal spoken dialogue system listens continually and determines which utterances were spoken to it, understands them and responds appropriately while ignoring the rest This paper outlines a simple method for achieving this goal which involves trading a slightly higher false rejection rate of in domain utterances for a higher correct rejection rate of Out of Domain (OOD) utterances. The system recognizes semantic entities specified by a unification grammar which is specialized by Explanation Based Learning (EBL). so that it only uses rules which are seen in the training data. The resulting grammar has probabilities assigned to each construct so that overgeneralizations are not a problem. The resulting system only recognizes utterances which reduce to a valid logical form which has meaning for the system and rejects the rest. A class N-gram grammar has been trained on the same training data. This system gives good recognition performance and offers good Out of Domain discrimination when combined with the semantic analysis. The resulting systems were tested on a Space Station Robot Dialogue Speech Database and a subset of the OGI conversational speech database. Both systems run in real time on a PC laptop and the present performance allows continuous listening with an acceptably low false acceptance rate. This type of open microphone system has been used in the Clarissa procedure reading and navigation spoken dialogue system which is being tested on the International Space Station.
Child implant users' imitation of happy- and sad-sounding speech
Wang, David J.; Trehub, Sandra E.; Volkova, Anna; van Lieshout, Pascal
2013-01-01
Cochlear implants have enabled many congenitally or prelingually deaf children to acquire their native language and communicate successfully on the basis of electrical rather than acoustic input. Nevertheless, degraded spectral input provided by the device reduces the ability to perceive emotion in speech. We compared the vocal imitations of 5- to 7-year-old deaf children who were highly successful bilateral implant users with those of a control sample of children who had normal hearing. First, the children imitated several happy and sad sentences produced by a child model. When adults in Experiment 1 rated the similarity of imitated to model utterances, ratings were significantly higher for the hearing children. Both hearing and deaf children produced poorer imitations of happy than sad utterances because of difficulty matching the greater pitch modulation of the happy versions. When adults in Experiment 2 rated electronically filtered versions of the utterances, which obscured the verbal content, ratings of happy and sad utterances were significantly differentiated for deaf as well as hearing children. The ratings of deaf children, however, were significantly less differentiated. Although deaf children's utterances exhibited culturally typical pitch modulation, their pitch modulation was reduced relative to that of hearing children. One practical implication is that therapeutic interventions for deaf children could expand their focus on suprasegmental aspects of speech perception and production, especially intonation patterns. PMID:23801976
Arnulf, Isabelle; Uguccioni, Ginevra; Gay, Frederick; Baldayrou, Etienne; Golmard, Jean-Louis; Gayraud, Frederique; Devevey, Alain
2017-11-01
Speech is a complex function in humans, but the linguistic characteristics of sleep talking are unknown. We analyzed sleep-associated speech in adults, mostly (92%) during parasomnias. The utterances recorded during night-time video-polysomnography were analyzed for number of words, propositions and speech episodes, frequency, gaps and pauses (denoting turn-taking in the conversation), lemmatization, verbosity, negative/imperative/interrogative tone, first/second person, politeness, and abuse. Two hundred thirty-two subjects (aged 49.5 ± 20 years old; 41% women; 129 with rapid eye movement [REM] sleep behavior disorder and 87 with sleepwalking/sleep terrors, 15 healthy subjects, and 1 patient with sleep apnea speaking in non-REM sleep) uttered 883 speech episodes, containing 59% nonverbal utterance (mumbles, shouts, whispers, and laughs) and 3349 understandable words. The most frequent word was "No": negations represented 21.4% of clauses (more in non-REM sleep). Interrogations were found in 26% of speech episodes (more in non-REM sleep), and subordinate clauses were found in 12.9% of speech episodes. As many as 9.7% of clauses contained profanities (more in non-REM sleep). Verbal abuse lasted longer in REM sleep and was mostly directed toward insulting or condemning someone, whereas swearing predominated in non-REM sleep. Men sleep-talked more than women and used a higher proportion of profanities. Apparent turn-taking in the conversation respected the usual language gaps. Sleep talking parallels awake talking for syntax, semantics, and turn-taking in conversation, suggesting that the sleeping brain can function at a high level. Language during sleep is mostly a familiar, tensed conversation with inaudible others, suggestive of conflicts. © Sleep Research Society 2017. Published by Oxford University Press [on behalf of the Sleep Research Society]. All rights reserved. For permissions, please email: journals.permissions@oup.com
Provine, Robert R.; Emmorey, Karen
2008-01-01
The placement of laughter in the speech of hearing individuals is not random but “punctuates” speech, occurring during pauses and at phrase boundaries where punctuation would be placed in a transcript of a conversation. For speakers, language is dominant in the competition for the vocal tract since laughter seldom interrupts spoken phrases. For users of American Sign Language, however, laughter and language do not compete in the same way for a single output channel. This study investigated whether laughter occurs simultaneously with signing, or punctuates signing, as it does speech, in 11 signed conversations (with two to five participants) that had at least one instance of audible, vocal laughter. Laughter occurred 2.7 times more often during pauses and at phrase boundaries than simultaneously with a signed utterance. Thus, the production of laughter involves higher order cognitive or linguistic processes rather than the low-level regulation of motor processes competing for a single vocal channel. In an examination of other variables, the social dynamics of deaf and hearing people were similar, with “speakers” (those signing) laughing more than their audiences and females laughing more than males. PMID:16891353
Provine, Robert R; Emmorey, Karen
2006-01-01
The placement of laughter in the speech of hearing individuals is not random but "punctuates" speech, occurring during pauses and at phrase boundaries where punctuation would be placed in a transcript of a conversation. For speakers, language is dominant in the competition for the vocal tract since laughter seldom interrupts spoken phrases. For users of American Sign Language, however, laughter and language do not compete in the same way for a single output channel. This study investigated whether laughter occurs simultaneously with signing, or punctuates signing, as it does speech, in 11 signed conversations (with two to five participants) that had at least one instance of audible, vocal laughter. Laughter occurred 2.7 times more often during pauses and at phrase boundaries than simultaneously with a signed utterance. Thus, the production of laughter involves higher order cognitive or linguistic processes rather than the low-level regulation of motor processes competing for a single vocal channel. In an examination of other variables, the social dynamics of deaf and hearing people were similar, with "speakers" (those signing) laughing more than their audiences and females laughing more than males.
ERIC Educational Resources Information Center
Walsh, Bridget; Smith, Anne
2011-01-01
Purpose: To investigate the effects of increased syntactic complexity and utterance length demands on speech production and comprehension in individuals with Parkinson's disease (PD) using behavioral and physiological measures. Method: Speech response latency, interarticulatory coordinative consistency, accuracy of speech production, and response…
NASA Astrophysics Data System (ADS)
Liberman, A. M.
1984-08-01
This report (1 January-30 June) is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts cover the following topics: Sources of variability in early speech development; Invariance: Functional or descriptive?; Brief comments on invariance in phonetic perception; Phonetic category boundaries are flexible; On categorizing asphasic speech errors; Universal and language particular aspects of vowel-to-vowel coarticulation; Functional specific articulatory cooperation following jaw perturbation; during speech: Evidence for coordinative structures; Formant integration and the perception of nasal vowel height; Relative power of cues: FO shifts vs. voice timing; Laryngeal management at utterance-internal word boundary in American English; Closure duration and release burst amplitude cues to stop consonant manner and place of articulation; Effects of temporal stimulus properties on perception of the (sl)-(spl) distinction; The physics of controlled conditions: A reverie about locomotion; On the perception of intonation from sinusoidal sentences; Speech Perception; Speech Articulation; Motor Control; Speech Development.
Tilsen, Sam; Arvaniti, Amalia
2013-07-01
This study presents a method for analyzing speech rhythm using empirical mode decomposition of the speech amplitude envelope, which allows for extraction and quantification of syllabic- and supra-syllabic time-scale components of the envelope. The method of empirical mode decomposition of a vocalic energy amplitude envelope is illustrated in detail, and several types of rhythm metrics derived from this method are presented. Spontaneous speech extracted from the Buckeye Corpus is used to assess the effect of utterance length on metrics, and it is shown how metrics representing variability in the supra-syllabic time-scale components of the envelope can be used to identify stretches of speech with targeted rhythmic characteristics. Furthermore, the envelope-based metrics are used to characterize cross-linguistic differences in speech rhythm in the UC San Diego Speech Lab corpus of English, German, Greek, Italian, Korean, and Spanish speech elicited in read sentences, read passages, and spontaneous speech. The envelope-based metrics exhibit significant effects of language and elicitation method that argue for a nuanced view of cross-linguistic rhythm patterns.
Theodore, Rachel M; Demuth, Katherine; Shattuck-Hufnagel, Stefanie
2015-06-01
Prosodic and articulatory factors influence children's production of inflectional morphemes. For example, plural -s is produced more reliably in utterance-final compared to utterance-medial position (i.e., the positional effect), which has been attributed to the increased planning time in utterance-final position. In previous investigations of plural -s, utterance-medial plurals were followed by a stop consonant (e.g., dogsbark), inducing high articulatory complexity. We examined whether the positional effect would be observed if the utterance-medial context were simplified to a following vowel. An elicited imitation task was used to collect productions of plural nouns from 2-year-old children. Nouns were elicited utterance-medially and utterance-finally, with the medial plural followed by either a stressed or an unstressed vowel. Acoustic analysis was used to identify evidence of morpheme production. The positional effect was absent when the morpheme was followed by a vowel (e.g., dogseat). However, it returned when the vowel-initial word contained 2 syllables (e.g., dogsarrive), suggesting that the increased processing load in the latter condition negated the facilitative effect of the easy articulatory context. Children's productions of grammatical morphemes reflect a rich interaction between emerging levels of linguistic competence, raising considerations for diagnosis and rehabilitation of language disorders.
Kinematic Analysis of Speech Sound Sequencing Errors Induced by Delayed Auditory Feedback.
Cler, Gabriel J; Lee, Jackson C; Mittelman, Talia; Stepp, Cara E; Bohland, Jason W
2017-06-22
Delayed auditory feedback (DAF) causes speakers to become disfluent and make phonological errors. Methods for assessing the kinematics of speech errors are lacking, with most DAF studies relying on auditory perceptual analyses, which may be problematic, as errors judged to be categorical may actually represent blends of sounds or articulatory errors. Eight typical speakers produced nonsense syllable sequences under normal and DAF (200 ms). Lip and tongue kinematics were captured with electromagnetic articulography. Time-locked acoustic recordings were transcribed, and the kinematics of utterances with and without perceived errors were analyzed with existing and novel quantitative methods. New multivariate measures showed that for 5 participants, kinematic variability for productions perceived to be error free was significantly increased under delay; these results were validated by using the spatiotemporal index measure. Analysis of error trials revealed both typical productions of a nontarget syllable and productions with articulatory kinematics that incorporated aspects of both the target and the perceived utterance. This study is among the first to characterize articulatory changes under DAF and provides evidence for different classes of speech errors, which may not be perceptually salient. New methods were developed that may aid visualization and analysis of large kinematic data sets. https://doi.org/10.23641/asha.5103067.
Kinematic Analysis of Speech Sound Sequencing Errors Induced by Delayed Auditory Feedback
Lee, Jackson C.; Mittelman, Talia; Stepp, Cara E.; Bohland, Jason W.
2017-01-01
Purpose Delayed auditory feedback (DAF) causes speakers to become disfluent and make phonological errors. Methods for assessing the kinematics of speech errors are lacking, with most DAF studies relying on auditory perceptual analyses, which may be problematic, as errors judged to be categorical may actually represent blends of sounds or articulatory errors. Method Eight typical speakers produced nonsense syllable sequences under normal and DAF (200 ms). Lip and tongue kinematics were captured with electromagnetic articulography. Time-locked acoustic recordings were transcribed, and the kinematics of utterances with and without perceived errors were analyzed with existing and novel quantitative methods. Results New multivariate measures showed that for 5 participants, kinematic variability for productions perceived to be error free was significantly increased under delay; these results were validated by using the spatiotemporal index measure. Analysis of error trials revealed both typical productions of a nontarget syllable and productions with articulatory kinematics that incorporated aspects of both the target and the perceived utterance. Conclusions This study is among the first to characterize articulatory changes under DAF and provides evidence for different classes of speech errors, which may not be perceptually salient. New methods were developed that may aid visualization and analysis of large kinematic data sets. Supplemental Material https://doi.org/10.23641/asha.5103067 PMID:28655038
Listeners feel the beat: entrainment to English and French speech rhythms.
Lidji, Pascale; Palmer, Caroline; Peretz, Isabelle; Morningstar, Michele
2011-12-01
Can listeners entrain to speech rhythms? Monolingual speakers of English and French and balanced English-French bilinguals tapped along with the beat they perceived in sentences spoken in a stress-timed language, English, and a syllable-timed language, French. All groups of participants tapped more regularly to English than to French utterances. Tapping performance was also influenced by the participants' native language: English-speaking participants and bilinguals tapped more regularly and at higher metrical levels than did French-speaking participants, suggesting that long-term linguistic experience with a stress-timed language can differentiate speakers' entrainment to speech rhythm.
Syllable-related breathing in infants in the second year of life.
Parham, Douglas F; Buder, Eugene H; Oller, D Kimbrough; Boliek, Carol A
2011-08-01
This study explored whether breathing behaviors of infants within the 2nd year of life differ between tidal breathing and breathing supporting single unarticulated syllables and canonical/articulated syllables. Vocalizations and breathing kinematics of 9 infants between 53 and 90 weeks of age were recorded. A strict selection protocol was used to identify analyzable breath cycles. Syllables were categorized on the basis of consensus coding. Inspiratory and expiratory durations, excursions, and slopes were calculated for the 3 breath cycle types and were normalized using mean tidal breath measures. Tidal breathing cycles were significantly different from syllable-related cycles on all breathing measures. There were no significant differences between unarticulated syllable cycles and canonical syllable cycles, even after controlling for utterance duration and sound pressure level. Infants in the 2nd year of life exhibit clear differences between tidal breathing and speech-related breathing, but categorically distinct breath support for syllable types with varying articulatory demands was not evident in the present findings. Speech development introduces increasingly complex utterances, so older infants may produce detectable articulation-related adaptations of breathing kinematics. For younger infants, breath support may vary systematically among utterance types, due more to phonatory variations than to articulatory demands.
Syllable-Related Breathing in Infants in the Second Year of Life
Parham, Douglas F.; Buder, Eugene H.; Oller, D. Kimbrough; Boliek, Carol A.
2010-01-01
Purpose This study explored whether breathing behaviors of infants within the second year of life differ between tidal breathing and breathing supporting single unarticulated syllables and canonical/articulated syllables. Method Vocalizations and breathing kinematics of nine infants between 53 and 90 weeks of age were recorded. A strict selection protocol was used to identify analyzable breath cycles. Syllables were categorized based on consensus coding. Inspiratory and expiratory durations, excursions, and slopes were calculated for the three breath cycle types and normalized using mean tidal breath measures. Results Tidal breathing cycles were significantly different from syllable-related cycles on all breathing measures. There were no significant differences between unarticulated syllable cycles and canonical syllable cycles, even after controlling for utterance duration and sound pressure level. Conclusions Infants in the second year of life exhibit clear differences between tidal breathing and speech-related breathing, but categorically distinct breath support for syllable types with varying articulatory demands was not evident in the current findings. Speech development introduces increasingly complex utterances, so older infants may produce detectable articulation-related adaptations of breathing kinematics. For younger infants, breath support may vary systematically among utterance types, due more to phonatory variations than to articulatory demands. PMID:21173390
ERIC Educational Resources Information Center
Le Normand, M. T.; Moreno-Torres, I.; Parisse, C.; Dellatolas, G.
2013-01-01
In the last 50 years, researchers have debated over the lexical or grammatical nature of children's early multiword utterances. Due to methodological limitations, the issue remains controversial. This corpus study explores the effect of grammatical, lexical, and pragmatic categories on mean length of utterances (MLU). A total of 312 speech samples…
Pennington, Lindsay; Lombardo, Eftychia; Steen, Nick; Miller, Nick
2018-01-01
The speech intelligibility of children with dysarthria and cerebral palsy has been observed to increase following therapy focusing on respiration and phonation. To determine if speech intelligibility change following intervention is associated with change in acoustic measures of voice. We recorded 16 young people with cerebral palsy and dysarthria (nine girls; mean age 14 years, SD = 2; nine spastic type, two dyskinetic, four mixed; one Worster-Drought) producing speech in two conditions (single words, connected speech) twice before and twice after therapy focusing on respiration, phonation and rate. In both single-word and connected speech we measured vocal intensity (root mean square-RMS), period-to-period variability (Shimmer APQ, Jitter RAP and PPQ) and harmonics-to-noise ratio (HNR). In connected speech we also measured mean fundamental frequency, utterance duration in seconds and speech and articulation rate (syllables/s with and without pauses respectively). All acoustic measures were made using Praat. Intelligibility was calculated in previous research. In single words statistically significant but very small reductions were observed in period-to-period variability following therapy: Shimmer APQ -0.15 (95% CI = -0.21 to -0.09); Jitter RAP -0.08 (95% CI = -0.14 to -0.01); Jitter PPQ -0.08 (95% CI = -0.15 to -0.01). No changes in period-to-period perturbation across phrases in connected speech were detected. However, changes in connected speech were observed in phrase length, rate and intensity. Following therapy, mean utterance duration increased by 1.11 s (95% CI = 0.37-1.86) when measured with pauses and by 1.13 s (95% CI = 0.40-1.85) when measured without pauses. Articulation rate increased by 0.07 syllables/s (95% CI = 0.02-0.13); speech rate increased by 0.06 syllables/s (95% CI = < 0.01-0.12); and intensity increased by 0.03 Pascals (95% CI = 0.02-0.04). There was a gradual reduction in mean fundamental frequency across all time points (-11.85 Hz, 95% CI = -19.84 to -3.86). Only increases in the intensity of single words (0.37 Pascals, 95% CI = 0.10-0.65) and reductions in fundamental frequency (-0.11 Hz, 95% CI = -0.21 to -0.02) in connected speech were associated with gains in intelligibility. Mean reductions in impairment in vocal function following therapy observed were small and most are unlikely to be clinically significant. Changes in vocal control did not explain improved intelligibility. © 2017 Royal College of Speech and Language Therapists.
Utterance complexity and stuttering on function words in preschool-age children who stutter.
Richels, Corrin; Buhr, Anthony; Conture, Edward; Ntourou, Katerina
2010-09-01
The purpose of the present investigation was to examine the relation between utterance complexity and utterance position and the tendency to stutter on function words in preschool-age children who stutter (CWS). Two separate studies involving two different groups of participants (Study 1, n=30; Study 2, n=30) were conducted. Participants were preschool-age CWS between the age of 3, 0 and 5, 11 who engaged in 15-20min parent-child conversational interactions. From audio-video recordings of each interaction, every child utterance of each parent-child sample was transcribed. From these transcripts, for each participant, measures of language (e.g., length and complexity) and measures of stuttering (e.g., word type and utterance position) were obtained. Results of Study 1 indicated that children stuttered more frequently on function words, but that this tendency was not greater for complex than simple utterances. Results of Study 2, involving the assessment of utterance position and MLU quartile, indicated that that stuttering was more likely to occur with increasing sentence length, and that stuttering tended to occur at the utterance-initial position, the position where function words were also more likely to occur. Findings were taken to suggest that, although word-level influences cannot be discounted, utterance-level influences contribute to the loci of stuttering in preschool-age children, and may help account for developmental changes in the loci of stuttering. The reader will learn about and be able to: (a) describe the influence of word type (function versus content words), and grammatical complexity, on disfluent speech; (b) compare the effect of stuttering frequency based on the position of the word in the utterance; (c) discuss the contribution of utterance position on the frequency of stuttering on function words; and (d) explain possible reasons why preschoolers stutter more frequently on function words than content words.
Intra-oral pressure-based voicing control of electrolaryngeal speech with intra-oral vibrator.
Takahashi, Hirokazu; Nakao, Masayuki; Kikuchi, Yataro; Kaga, Kimitaka
2008-07-01
In normal speech, coordinated activities of intrinsic laryngeal muscles suspend a glottal sound at utterance of voiceless consonants, automatically realizing a voicing control. In electrolaryngeal speech, however, the lack of voicing control is one of the causes of unclear voice, voiceless consonants tending to be misheard as the corresponding voiced consonants. In the present work, we developed an intra-oral vibrator with an intra-oral pressure sensor that detected utterance of voiceless phonemes during the intra-oral electrolaryngeal speech, and demonstrated that an intra-oral pressure-based voicing control could improve the intelligibility of the speech. The test voices were obtained from one electrolaryngeal speaker and one normal speaker. We first investigated on the speech analysis software how a voice onset time (VOT) and first formant (F1) transition of the test consonant-vowel syllables contributed to voiceless/voiced contrasts, and developed an adequate voicing control strategy. We then compared the intelligibility of consonant-vowel syllables among the intra-oral electrolaryngeal speech with and without online voicing control. The increase of intra-oral pressure, typically with a peak ranging from 10 to 50 gf/cm2, could reliably identify utterance of voiceless consonants. The speech analysis and intelligibility test then demonstrated that a short VOT caused the misidentification of the voiced consonants due to a clear F1 transition. Finally, taking these results together, the online voicing control, which suspended the prosthetic tone while the intra-oral pressure exceeded 2.5 gf/cm2 and during the 35 milliseconds that followed, proved efficient to improve the voiceless/voiced contrast.
Rhythmic patterning in Malaysian and Singapore English.
Tan, Rachel Siew Kuang; Low, Ee-Ling
2014-06-01
Previous work on the rhythm of Malaysian English has been based on impressionistic observations. This paper utilizes acoustic analysis to measure the rhythmic patterns of Malaysian English. Recordings of the read speech and spontaneous speech of 10 Malaysian English speakers were analyzed and compared with recordings of an equivalent sample of Singaporean English speakers. Analysis was done using two rhythmic indexes, the PVI and VarcoV. It was found that although the rhythm of read speech of the Singaporean speakers was syllable-based as described by previous studies, the rhythm of the Malaysian speakers was even more syllable-based. Analysis of the syllables in specific utterances showed that Malaysian speakers did not reduce vowels as much as Singaporean speakers in cases of syllables in utterances. Results of the spontaneous speech confirmed the findings for the read speech; that is, the same rhythmic patterning was found which normally triggers vowel reductions.
Abnormal laughter-like vocalisations replacing speech in primary progressive aphasia
Rohrer, Jonathan D.; Warren, Jason D.; Rossor, Martin N.
2009-01-01
We describe ten patients with a clinical diagnosis of primary progressive aphasia (PPA) (pathologically confirmed in three cases) who developed abnormal laughter-like vocalisations in the context of progressive speech output impairment leading to mutism. Failure of speech output was accompanied by increasing frequency of the abnormal vocalisations until ultimately they constituted the patient's only extended utterance. The laughter-like vocalisations did not show contextual sensitivity but occurred as an automatic vocal output that replaced speech. Acoustic analysis of the vocalisations in two patients revealed abnormal motor features including variable note duration and inter-note interval, loss of temporal symmetry of laugh notes and loss of the normal decrescendo. Abnormal laughter-like vocalisations may be a hallmark of a subgroup in the PPA spectrum with impaired control and production of nonverbal vocal behaviour due to disruption of fronto-temporal networks mediating vocalisation. PMID:19435636
Abnormal laughter-like vocalisations replacing speech in primary progressive aphasia.
Rohrer, Jonathan D; Warren, Jason D; Rossor, Martin N
2009-09-15
We describe ten patients with a clinical diagnosis of primary progressive aphasia (PPA) (pathologically confirmed in three cases) who developed abnormal laughter-like vocalisations in the context of progressive speech output impairment leading to mutism. Failure of speech output was accompanied by increasing frequency of the abnormal vocalisations until ultimately they constituted the patient's only extended utterance. The laughter-like vocalisations did not show contextual sensitivity but occurred as an automatic vocal output that replaced speech. Acoustic analysis of the vocalisations in two patients revealed abnormal motor features including variable note duration and inter-note interval, loss of temporal symmetry of laugh notes and loss of the normal decrescendo. Abnormal laughter-like vocalisations may be a hallmark of a subgroup in the PPA spectrum with impaired control and production of nonverbal vocal behaviour due to disruption of fronto-temporal networks mediating vocalisation.
Soller, R. William; Chan, Philip; Higa, Amy
2012-01-01
Background Language barriers are significant hurdles for chronic disease patients in achieving self-management goals of therapy, particularly in settings where practitioners have limited nonprimary language skills, and in-person translators may not always be available. S-MINDS© (Speaking Multilingual Interactive Natural Dialog System), a concept-based speech translation approach developed by Fluential Inc., can be applied to bridge the technologic gaps that limit the complexity and length of utterances that can be recognized and translated by devices and has the potential to broaden access to translation services in the clinical settings. Methods The prototype translation system was evaluated prospectively for accuracy and patient satisfaction in underserved Spanish-speaking patients with diabetes and limited English proficiency and was compared with other commercial systems for robustness against degradation of translation due to ambient noise and speech patterns. Results Accuracy related to translating the English–Spanish–English communication string from practitioner to device to patient to device to practitioner was high (97–100%). Patient satisfaction was high (means of 4.7–4.9 over four domains on a 5-point Likert scale). The device outperformed three other commercial speech translation systems in terms of accuracy during fast speech utterances, under quiet and noisy fluent speech conditions, and when challenged with various speech disfluencies (i.e., fillers, false starts, stutters, repairs, and long pauses). Conclusions A concept-based English–Spanish speech translation system has been successfully developed in prototype form that can accept long utterances (up to 20 words) with limited to no degradation in accuracy. The functionality of the system is superior to leading commercial speech translation systems. PMID:22920821
Entrainment of Prosody in the Interaction of Mothers with Their Young Children
ERIC Educational Resources Information Center
Ko, Eon-Suk; Seidl, Amanda; Cristia, Alejandrina; Reimchen, Melissa; Soderstrom, Melanie
2016-01-01
Caregiver speech is not a static collection of utterances, but occurs in "conversational exchanges," in which caregiver and child dynamically influence each other's speech. We investigate (a) whether children and caregivers modulate the prosody of their speech as a function of their interlocutor's speech, and (b) the influence of the…
Brain basis of communicative actions in language
Egorova, Natalia; Shtyrov, Yury; Pulvermüller, Friedemann
2016-01-01
Although language is a key tool for communication in social interaction, most studies in the neuroscience of language have focused on language structures such as words and sentences. Here, the neural correlates of speech acts, that is, the actions performed by using language, were investigated with functional magnetic resonance imaging (fMRI). Participants were shown videos, in which the same critical utterances were used in different communicative contexts, to Name objects, or to Request them from communication partners. Understanding of critical utterances as Requests was accompanied by activation in bilateral premotor, left inferior frontal and temporo-parietal cortical areas known to support action-related and social interactive knowledge. Naming, however, activated the left angular gyrus implicated in linking information about word forms and related reference objects mentioned in critical utterances. These findings show that understanding of utterances as different communicative actions is reflected in distinct brain activation patterns, and thus suggest different neural substrates for different speech act types. PMID:26505303
Brain basis of communicative actions in language.
Egorova, Natalia; Shtyrov, Yury; Pulvermüller, Friedemann
2016-01-15
Although language is a key tool for communication in social interaction, most studies in the neuroscience of language have focused on language structures such as words and sentences. Here, the neural correlates of speech acts, that is, the actions performed by using language, were investigated with functional magnetic resonance imaging (fMRI). Participants were shown videos, in which the same critical utterances were used in different communicative contexts, to Name objects, or to Request them from communication partners. Understanding of critical utterances as Requests was accompanied by activation in bilateral premotor, left inferior frontal and temporo-parietal cortical areas known to support action-related and social interactive knowledge. Naming, however, activated the left angular gyrus implicated in linking information about word forms and related reference objects mentioned in critical utterances. These findings show that understanding of utterances as different communicative actions is reflected in distinct brain activation patterns, and thus suggest different neural substrates for different speech act types. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Emelyanova, Irina A.; Borisova, Elena A.; Shapovalova, Olga E.; Karynbaeva, Olga V.; Vorotilkina, Irina M.
2018-01-01
The relevance of the research is due to the necessity of creating the pedagogical conditions for correction and development of speech in children having the general speech underdevelopment. For them, difficulties generating a coherent utterance are characteristic, which prevents a sufficient speech readiness for schooling forming in them as well…
Wu, Ying Choon; Coulson, Seana
2015-11-01
To understand a speaker's gestures, people may draw on kinesthetic working memory (KWM)-a system for temporarily remembering body movements. The present study explored whether sensitivity to gesture meaning was related to differences in KWM capacity. KWM was evaluated through sequences of novel movements that participants viewed and reproduced with their own bodies. Gesture sensitivity was assessed through a priming paradigm. Participants judged whether multimodal utterances containing congruent, incongruent, or no gestures were related to subsequent picture probes depicting the referents of those utterances. Individuals with low KWM were primarily inhibited by incongruent speech-gesture primes, whereas those with high KWM showed facilitation-that is, they were able to identify picture probes more quickly when preceded by congruent speech and gestures than by speech alone. Group differences were most apparent for discourse with weakly congruent speech and gestures. Overall, speech-gesture congruency effects were positively correlated with KWM abilities, which may help listeners match spatial properties of gestures to concepts evoked by speech. © The Author(s) 2015.
NASA Astrophysics Data System (ADS)
Anagnostopoulos, Christos Nikolaos; Vovoli, Eftichia
An emotion recognition framework based on sound processing could improve services in human-computer interaction. Various quantitative speech features obtained from sound processing of acting speech were tested, as to whether they are sufficient or not to discriminate between seven emotions. Multilayered perceptrons were trained to classify gender and emotions on the basis of a 24-input vector, which provide information about the prosody of the speaker over the entire sentence using statistics of sound features. Several experiments were performed and the results were presented analytically. Emotion recognition was successful when speakers and utterances were “known” to the classifier. However, severe misclassifications occurred during the utterance-independent framework. At least, the proposed feature vector achieved promising results for utterance-independent recognition of high- and low-arousal emotions.
Processing of affective speech prosody is impaired in Asperger syndrome.
Korpilahti, Pirjo; Jansson-Verkasalo, Eira; Mattila, Marja-Leena; Kuusikko, Sanna; Suominen, Kalervo; Rytky, Seppo; Pauls, David L; Moilanen, Irma
2007-09-01
Many people with the diagnosis of Asperger syndrome (AS) show poorly developed skills in understanding emotional messages. The present study addressed discrimination of speech prosody in children with AS at neurophysiological level. Detection of affective prosody was investigated in one-word utterances as indexed by the N1 and the mismatch negativity (MMN) of auditory event-related potentials (ERPs). Data from fourteen boys with AS were compared with those for thirteen typically developed boys. These results suggest atypical neural responses to affective prosody in children with AS and their fathers, especially over the RH, and that this impairment can already be seen at low-level information processes. Our results provide evidence for familial patterns of abnormal auditory brain reactions to prosodic features of speech.
Wallesch, C W; Brunner, R J; Seemüller, E
1983-12-01
Repetitive phenomena in spontaneous speech were investigated in 30 patients with chronic infarctions of the left hemisphere which included Broca's and/or Wernicke's area and/or the basal ganglia. Perseverations, stereotypies, and echolalias occurred with all types of brain lesions, automatisms and recurring utterances only with those patients, whose infarctions involved Wernicke's area and basal ganglia. These patients also showed more echolalic responses. The results are discussed in view of the role of the basal ganglia as motor program generators.
2013-01-01
Background Individuals suffering from vision loss of a peripheral origin may learn to understand spoken language at a rate of up to about 22 syllables (syl) per second - exceeding by far the maximum performance level of normal-sighted listeners (ca. 8 syl/s). To further elucidate the brain mechanisms underlying this extraordinary skill, functional magnetic resonance imaging (fMRI) was performed in blind subjects of varying ultra-fast speech comprehension capabilities and sighted individuals while listening to sentence utterances of a moderately fast (8 syl/s) or ultra-fast (16 syl/s) syllabic rate. Results Besides left inferior frontal gyrus (IFG), bilateral posterior superior temporal sulcus (pSTS) and left supplementary motor area (SMA), blind people highly proficient in ultra-fast speech perception showed significant hemodynamic activation of right-hemispheric primary visual cortex (V1), contralateral fusiform gyrus (FG), and bilateral pulvinar (Pv). Conclusions Presumably, FG supports the left-hemispheric perisylvian “language network”, i.e., IFG and superior temporal lobe, during the (segmental) sequencing of verbal utterances whereas the collaboration of bilateral pulvinar, right auditory cortex, and ipsilateral V1 implements a signal-driven timing mechanism related to syllabic (suprasegmental) modulation of the speech signal. These data structures, conveyed via left SMA to the perisylvian “language zones”, might facilitate – under time-critical conditions – the consolidation of linguistic information at the level of verbal working memory. PMID:23879896
Audio steganography by amplitude or phase modification
NASA Astrophysics Data System (ADS)
Gopalan, Kaliappan; Wenndt, Stanley J.; Adams, Scott F.; Haddad, Darren M.
2003-06-01
This paper presents the results of embedding short covert message utterances on a host, or cover, utterance by modifying the phase or amplitude of perceptually masked or significant regions of the host. In the first method, the absolute phase at selected, perceptually masked frequency indices was changed to fixed, covert data-dependent values. Embedded bits were retrieved at the receiver from the phase at the selected frequency indices. Tests on embedding a GSM-coded covert utterance on clean and noisy host utterances showed no noticeable difference in the stego compared to the hosts in speech quality or spectrogram. A bit error rate of 2 out of 2800 was observed for a clean host utterance while no error occurred for a noisy host. In the second method, the absolute phase of 10 or fewer perceptually significant points in the host was set in accordance with covert data. This resulted in a stego with successful data retrieval and a slightly noticeable degradation in speech quality. Modifying the amplitude of perceptually significant points caused perceptible differences in the stego even with small changes of amplitude made at five points per frame. Finally, the stego obtained by altering the amplitude at perceptually masked points showed barely noticeable differences and excellent data recovery.
Deep bottleneck features for spoken language identification.
Jiang, Bing; Song, Yan; Wei, Si; Liu, Jun-Hua; McLoughlin, Ian Vince; Dai, Li-Rong
2014-01-01
A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed.
Why Should Speech Rate (Tempo) Be Integrated into Pronunciation Teaching Curriculum
ERIC Educational Resources Information Center
Yurtbasi, Meti
2015-01-01
The pace of speech i.e. tempo can be varied to our mood of the moment. Fast speech can convey urgency, whereas slower speech can be used for emphasis. In public speaking, orators produce powerful effects by varying the loudness and pace of their speech. The juxtaposition of very loud and very quiet utterances is a device often used by those trying…
Measuring Speech Comprehensibility in Students with Down Syndrome
ERIC Educational Resources Information Center
Yoder, Paul J.; Woynaroski, Tiffany; Camarata, Stephen
2016-01-01
Purpose: There is an ongoing need to develop assessments of spontaneous speech that focus on whether the child's utterances are comprehensible to listeners. This study sought to identify the attributes of a stable ratings-based measure of speech comprehensibility, which enabled examining the criterion-related validity of an orthography-based…
Voice Modulations in German Ironic Speech
ERIC Educational Resources Information Center
Scharrer, Lisa; Christmann, Ursula; Knoll, Monja
2011-01-01
Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech, and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German "ironic…
Johnson, Elizabeth K.; Seidl, Amanda; Tyler, Michael D.
2014-01-01
Past research has shown that English learners begin segmenting words from speech by 7.5 months of age. However, more recent research has begun to show that, in some situations, infants may exhibit rudimentary segmentation capabilities at an earlier age. Here, we report on four perceptual experiments and a corpus analysis further investigating the initial emergence of segmentation capabilities. In Experiments 1 and 2, 6-month-olds were familiarized with passages containing target words located either utterance medially or at utterance edges. Only those infants familiarized with passages containing target words aligned with utterance edges exhibited evidence of segmentation. In Experiments 3 and 4, 6-month-olds recognized familiarized words when they were presented in a new acoustically distinct voice (male rather than female), but not when they were presented in a phonologically altered manner (missing the initial segment). Finally, we report corpus analyses examining how often different word types occur at utterance boundaries in different registers. Our findings suggest that edge-aligned words likely play a key role in infants’ early segmentation attempts, and also converge with recent reports suggesting that 6-month-olds’ have already started building a rudimentary lexicon. PMID:24421892
Speech-recognition interfaces for music information retrieval
NASA Astrophysics Data System (ADS)
Goto, Masataka
2005-09-01
This paper describes two hands-free music information retrieval (MIR) systems that enable a user to retrieve and play back a musical piece by saying its title or the artist's name. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. Our MIR-based jukebox systems employ two different speech-recognition interfaces for MIR, speech completion and speech spotter, which exploit intentionally controlled nonverbal speech information in original ways. The first is a music retrieval system with the speech-completion interface that is suitable for music stores and car-driving situations. When a user only remembers part of the name of a musical piece or an artist and utters only a remembered fragment, the system helps the user recall and enter the name by completing the fragment. The second is a background-music playback system with the speech-spotter interface that can enrich human-human conversation. When a user is talking to another person, the system allows the user to enter voice commands for music playback control by spotting a special voice-command utterance in face-to-face or telephone conversations. Experimental results from use of these systems have demonstrated the effectiveness of the speech-completion and speech-spotter interfaces. (Video clips: http://staff.aist.go.jp/m.goto/MIR/speech-if.html)
Understanding speaker attitudes from prosody by adults with Parkinson's disease.
Monetta, Laura; Cheang, Henry S; Pell, Marc D
2008-09-01
The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical 'pseudo-utterances' were presented to listener groups with and without PD in two separate rating tasks. Task I required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo-utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the politelimpolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language (Pell & Leonard, 2003).
NASA Astrophysics Data System (ADS)
Bell, Alan; Jurafsky, Daniel; Fosler-Lussier, Eric; Girand, Cynthia; Gregory, Michelle; Gildea, Daniel
2003-02-01
Function words, especially frequently occurring ones such as (the, that, and, and of ), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.g., edh,eye, edh,æ,tee, æ,en,&dee, vee) or a more reduced or lenited pronunciation (e.g., edh,schwa, edh,bari;t, n, schwa). It is based on over 8000 occurrences of the ten most frequent English function words in a 4-h sample from conversations from the Switchboard corpus. Ordinary linear and logistic regression models were used to examine variation in the length of the words, in the form of their vowel (basic, full, or reduced), and whether final obstruents were present or not. For all these measures, after controlling for segmental context, rate of speech, and other important factors, there are strong independent effects that made high-frequency monosyllabic function words more likely to be longer or have a fuller form (1) when neighboring disfluencies (such as filled pauses uh and um) indicate that the speaker was encountering problems in planning the utterance; (2) when the word is unexpected, i.e., less predictable in context; (3) when the word is either utterance initial or utterance final. Looking at the phenomenon in a different way, frequent function words are more likely to be shorter and to have less-full forms in fluent speech, in predictable positions or multiword collocations, and utterance internally. Also considered are other factors such as sex (women are more likely to use fuller forms, even after controlling for rate of speech, for example), and some of the differences among the ten function words in their response to the factors.
Constructing Adequate Non-Speech Analogues: What Is Special about Speech Anyway?
ERIC Educational Resources Information Center
Rosen, Stuart; Iverson, Paul
2007-01-01
Vouloumanos and Werker (2007) claim that human neonates have a (possibly innate) bias to listen to speech based on a preference for natural speech utterances over sine-wave analogues. We argue that this bias more likely arises from the strikingly different saliency of voice melody in the two kinds of sounds, a bias that has already been shown to…
Together They Stand: Interpreting Not-At-Issue Content.
Frazier, Lyn; Dillon, Brian; Clifton, Charles
2018-06-01
Potts unified the account of appositives, parentheticals, expressives, and honorifics as 'Not- At-Issue' (NAI) content, treating them as a natural class semantically in behaving like root (unembedded) structures, typically expressing speaker commitments, and being interpreted independently of At-Issue content. We propose that NAI content expresses a complete speech act distinct from the speech act of the containing utterance. The speech act hypothesis leads us to expect the semantic properties Potts established. We present experimental confirmation of two intuitive observations made by Potts: first that speech act adverbs should be acceptable as NAI content, supporting the speech act hypothesis; and second, that when two speech acts are expressed as successive sentences, the comprehender assumes they are related by some discourse coherence relation, whereas an NAI speech act need not bear a restrictive discourse coherence relation to its containing utterance, though overall sentences containing relevant content are rated more acceptable than those that do not. The speech act hypothesis accounts for these effects, and further accounts for why judgments of syntactic complexity or evaluations of whether or not a statement is true interact with the at-issue status of the material being judged or evaluated.
Gesture and speech during shared book reading with preschoolers with specific language impairment.
Lavelli, Manuela; Barachetti, Chiara; Florit, Elena
2015-11-01
This study examined (a) the relationship between gesture and speech produced by children with specific language impairment (SLI) and typically developing (TD) children, and their mothers, during shared book-reading, and (b) the potential effectiveness of gestures accompanying maternal speech on the conversational responsiveness of children. Fifteen preschoolers with expressive SLI were compared with fifteen age-matched and fifteen language-matched TD children. Child and maternal utterances were coded for modality, gesture type, gesture-speech informational relationship, and communicative function. Relative to TD peers, children with SLI used more bimodal utterances and gestures adding unique information to co-occurring speech. Some differences were mirrored in maternal communication. Sequential analysis revealed that only in the SLI group maternal reading accompanied by gestures was significantly followed by child's initiatives, and when maternal non-informative repairs were accompanied by gestures, they were more likely to elicit adequate answers from children. These findings support the 'gesture advantage' hypothesis in children with SLI, and have implications for educational and clinical practice.
Effects of speaking task on intelligibility in Parkinson’s disease
TJADEN, KRIS; WILDING, GREG
2017-01-01
Intelligibility tests for dysarthria typically provide an estimate of overall severity for speech materials elicited through imitation or read from a printed script. The extent to which these types of tasks and procedures reflect intelligibility for extemporaneous speech is not well understood. The purpose of this study was to compare intelligibility estimates obtained for a reading passage and an extemporaneous monologue produced by12 speakers with Parkinson’s disease (PD). The relationship between structural characteristics of utterances and scaled intelligibility was explored within speakers. Speakers were audio-recorded while reading a paragraph and producing a monologue. Speech samples were separated into individual utterances for presentation to 70 listeners who judged intelligibility using orthographic transcription and direct magnitude estimation (DME). Results suggest that scaled estimates of intelligibility for reading show potential for indexing intelligibility of an extemporaneous monologue. Within-speaker variation in scaled intelligibility also was related to the number of words per speech run for extemporaneous speech. PMID:20887216
Speech target modulates speaking induced suppression in auditory cortex
Ventura, Maria I; Nagarajan, Srikantan S; Houde, John F
2009-01-01
Background Previous magnetoencephalography (MEG) studies have demonstrated speaking-induced suppression (SIS) in the auditory cortex during vocalization tasks wherein the M100 response to a subject's own speaking is reduced compared to the response when they hear playback of their speech. Results The present MEG study investigated the effects of utterance rapidity and complexity on SIS: The greatest difference between speak and listen M100 amplitudes (i.e., most SIS) was found in the simple speech task. As the utterances became more rapid and complex, SIS was significantly reduced (p = 0.0003). Conclusion These findings are highly consistent with our model of how auditory feedback is processed during speaking, where incoming feedback is compared with an efference-copy derived prediction of expected feedback. Thus, the results provide further insights about how speech motor output is controlled, as well as the computational role of auditory cortex in transforming auditory feedback. PMID:19523234
[Perception features of emotional intonation of short pseudowords].
Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M
2012-01-01
Reaction time and recognition accuracy of speech emotional intonations in short meaningless words that differed only in one phoneme with background noise and without it were studied in 49 adults of 20-79 years old. The results were compared with the same parameters of emotional intonations in intelligent speech utterances under similar conditions. Perception of emotional intonations at different linguistic levels (phonological and lexico-semantic) was found to have both common features and certain peculiarities. Recognition characteristics of emotional intonations depending on gender and age of listeners appeared to be invariant with regard to linguistic levels of speech stimuli. Phonemic composition of pseudowords was found to influence the emotional perception, especially against the background noise. The most significant stimuli acoustic characteristic responsible for the perception of speech emotional prosody in short meaningless words under the two experimental conditions, i.e. with and without background noise, was the fundamental frequency variation.
Drijvers, Linda; Özyürek, Asli
2017-01-01
This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Twenty participants watched videos of an actress uttering an action verb and completed a free-recall task. The videos were presented in 3 speech conditions (2-band noise-vocoding, 6-band noise-vocoding, clear), 3 multimodal conditions (speech + lips blurred, speech + visible speech, speech + visible speech + gesture), and 2 visual-only conditions (visible speech, visible speech + gesture). Accuracy levels were higher when both visual articulators were present compared with 1 or none. The enhancement effects of (a) visible speech, (b) gestural information on top of visible speech, and (c) both visible speech and iconic gestures were larger in 6-band than 2-band noise-vocoding or visual-only conditions. Gestural enhancement in 2-band noise-vocoding did not differ from gestural enhancement in visual-only conditions. When perceiving degraded speech in a visual context, listeners benefit more from having both visual articulators present compared with 1. This benefit was larger at 6-band than 2-band noise-vocoding, where listeners can benefit from both phonological cues from visible speech and semantic cues from iconic gestures to disambiguate speech.
The Perception of "Sine-Wave Speech" by Adults with Developmental Dyslexia.
ERIC Educational Resources Information Center
Rosner, Burton S.; Talcott, Joel B.; Witton, Caroline; Hogg, James D.; Richardson, Alexandra J.; Hansen, Peter C.; Stein, John F.
2003-01-01
"Sine-wave speech" sentences contain only four frequency-modulated sine waves, lacking many acoustic cues present in natural speech. Adults with (n=19) and without (n=14) dyslexia were asked to reproduce orally sine-wave utterances in successive trials. Results suggest comprehension of sine-wave sentences is impaired in some adults with…
Neural Representations Used by Brain Regions Underlying Speech Production
ERIC Educational Resources Information Center
Segawa, Jennifer Anne
2013-01-01
Speech utterances are phoneme sequences but may not always be represented as such in the brain. For instance, electropalatography evidence indicates that as speaking rate increases, gestures within syllables are manipulated separately but those within consonant clusters act as one motor unit. Moreover, speech error data suggest that a syllable's…
Comparing Measures of Voice Quality from Sustained Phonation and Continuous Speech
ERIC Educational Resources Information Center
Gerratt, Bruce R.; Kreiman, Jody; Garellek, Marc
2016-01-01
Purpose: The question of what type of utterance--a sustained vowel or continuous speech--is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation.…
The Role of Speech Rhythm in Language Discrimination: Further Tests with a Non-Human Primate
ERIC Educational Resources Information Center
Tincoff, Ruth; Hauser, Marc; Tsao, Fritz; Spaepen, Geertrui; Ramus, Franck; Mehler, Jacques
2005-01-01
Human newborns discriminate languages from different rhythmic classes, fail to discriminate languages from the same rhythmic class, and fail to discriminate languages when the utterances are played backwards. Recent evidence showing that cotton-top tamarins discriminate Dutch from Japanese, but not when utterances are played backwards, is…
Learning Words from Labeling and Directive Speech
ERIC Educational Resources Information Center
Callanan, Maureen A.; Akhtar, Nameera; Sussman, Lisa
2014-01-01
Despite the common intuition that labeling may be the best way to teach a new word to a child, systematic testing is needed of the prediction that children learn words better from labeling utterances than from directive utterances. Two experiments compared toddlers' label learning in the context of hearing words used in directive versus labeling…
Teaching strategies in inclusive classrooms with deaf students.
Cawthon, S W
2001-01-01
The purpose of this study was to investigate teacher speech and educational philosophies in inclusive classrooms with deaf and hearing students. Data were collected from language transcripts, classroom observations, and teacher interviews. Total speech output, Mean Length Utterance, proportion of questions to statements, and proportion of open to closed questions were calculated for each teacher. Teachers directed fewer utterances, on average, to deaf than to hearing students but showed different language patterns on the remaining measures. Inclusive philosophies focused on an individualized approach to teaching, attention to deaf culture, advocacy, smaller class sizes, and an openness to diversity in the classroom. The interpreters' role in the classroom included translating teacher speech, voicing student sign language, mediating communication between deaf students and their peers, and monitoring overall classroom behavior.
ERIC Educational Resources Information Center
Saltuklaroglu, Tim; Kalinowski, Joseph; Robbins, Mary; Crawcour, Stephen; Bowers, Andrew
2009-01-01
Background: Stuttering is prone to strike during speech initiation more so than at any other point in an utterance. The use of auditory feedback (AAF) has been found to produce robust decreases in the stuttering frequency by creating an electronic rendition of choral speech (i.e., speaking in unison). However, AAF requires users to self-initiate…
Le Normand, M T; Moreno-Torres, I; Parisse, C; Dellatolas, G
2013-01-01
In the last 50 years, researchers have debated over the lexical or grammatical nature of children's early multiword utterances. Due to methodological limitations, the issue remains controversial. This corpus study explores the effect of grammatical, lexical, and pragmatic categories on mean length of utterances (MLU). A total of 312 speech samples from high-low socioeconomic status (SES) French-speaking children aged 2-4 years were annotated with a part-of-speech-tagger. Multiple regression analyses show that grammatical categories, particularly the most frequent subcategories, were the best predictors of MLU both across age and SES groups. These findings support the view that early language learning is guided by grammatical rather than by lexical words. This corpus research design can be used for future cross-linguistic and cross-pathology studies. © 2012 The Authors. Child Development © 2012 Society for Research in Child Development, Inc.
The effect of speaking style on a locus equation characterization of stop place of articulation.
Sussman, H M; Dalston, E; Gumbert, S
1998-01-01
Locus equations were employed to assess the phonetic stability and distinctiveness of stop place categories in reduced speech. Twenty-two speakers produced stop consonant + vowel utterances in citation and spontaneous speech. Coarticulatory increases in hypoarticulated speech were documented only for /dV/ and [gV] productions in front vowel contexts. Coarticulatory extents for /bV/ and [gV] in back vowel contexts remained stable across style changes. Discriminant analyses showed equivalent levels of correct classification across speaking styles. CV reduction was quantified by use of Euclidean distances separating stop place categories. Despite sensitivity of locus equation parameters to articulatory differences encountered in informal speech, stop place categories still maintained a clear separability when plotted in a higher-order slope x y-intercept acoustic space.
Prosodic Contrasts in Ironic Speech
ERIC Educational Resources Information Center
Bryant, Gregory A.
2010-01-01
Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…
ERIC Educational Resources Information Center
Brown-Schmidt, Sarah; Konopka, Agnieszka E.
2008-01-01
During unscripted speech, speakers coordinate the formulation of pre-linguistic messages with the linguistic processes that implement those messages into speech. We examine the process of constructing a contextually appropriate message and interfacing that message with utterance planning in English ("the small butterfly") and Spanish ("la mariposa…
Speech-Act Theory as a New Way of Conceptualizing the "Student Experience"
ERIC Educational Resources Information Center
Fisher, Andrew
2010-01-01
This article has four aims. The first is to characterize the key features of speech-act theory, and, in particular, to show that there is a genuine distinction between the sound uttered when someone is speaking (locution), the effect the speech has (perlocution) and the very "act" of speaking (the illocution). Secondly, it aims to…
Infant Word Segmentation Revisited: Edge Alignment Facilitates Target Extraction
ERIC Educational Resources Information Center
Seidl, Amanda; Johnson, Elizabeth K.
2006-01-01
In a landmark study, Jusczyk and Aslin (1995 ) demonstrated that English-learning infants are able to segment words from continuous speech at 7.5 months of age. In the current study, we explored the possibility that infants segment words from the edges of utterances more readily than the middle of utterances. The same procedure was used as in…
Dilley, Laura C; Wieland, Elizabeth A; Gamache, Jessica L; McAuley, J Devin; Redford, Melissa A
2013-02-01
As children mature, changes in voice spectral characteristics co-vary with changes in speech, language, and behavior. In this study, spectral characteristics were manipulated to alter the perceived ages of talkers' voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Speech was modified by lowering formants and fundamental frequency, for 5-year-old children's utterances, or raising them, for adult caregivers' utterances. Next, participants differing in awareness of the manipulation (Experiment 1A) or amount of speech-language training (Experiment 1B) made judgments of prosodic, segmental, and talker attributes. Experiment 2 investigated the effects of spectral modification on intelligibility. Finally, in Experiment 3, trained analysts used formal prosody coding to assess prosodic characteristics of spectrally modified and unmodified speech. Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work.
Echolalia and comprehension in autistic children.
Roberts, J M
1989-06-01
The research reported in this paper investigates the phenomenon of echolalia in the speech of autistic children by examining the relationship between the frequency of echolalia and receptive language ability. The receptive language skills of 10 autistic children were assessed, and spontaneous speech samples were recorded. Analysis of these data showed that those children with poor receptive language skills produced significantly more echolalic utterances than those children whose receptive skills were more age-appropriate. Children who produced fewer echolalic utterances, and had more advanced receptive language ability, evidenced a higher proportion of mitigated echolalia. The most common type of mitigation was echo plus affirmation or denial.
Intelligibility assessment in developmental phonological disorders: accuracy of caregiver gloss.
Kwiatkowski, J; Shriberg, L D
1992-10-01
Fifteen caregivers each glossed a simultaneously videotaped and audiotaped sample of their child with speech delay engaged in conversation with a clinician. One of the authors generated a reference gloss for each sample, aided by (a) prior knowledge of the child's speech-language status and error patterns, (b) glosses from the child's clinician and the child's caregiver, (c) unlimited replays of the taped sample, and (d) the information gained from completing a narrow phonetic transcription of the sample. Caregivers glossed an average of 78% of the utterances and 81% of the words. A comparison of their glosses to the reference glosses suggested that they accurately understood an average of 58% of the utterances and 73% of the words. Discussion considers the implications of such findings for methodological and theoretical issues underlying children's moment-to-moment intelligibility breakdowns during speech-language processing.
ERIC Educational Resources Information Center
Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann
2013-01-01
Blind people can learn to understand speech at ultra-high syllable rates (ca. 20 syllables/s), a capability associated with hemodynamic activation of the central-visual system. To further elucidate the neural mechanisms underlying this skill, magnetoencephalographic (MEG) measurements during listening to sentence utterances were cross-correlated…
Tóth, László; Hoffmann, Ildikó; Gosztolya, Gábor; Vincze, Veronika; Szatlóczki, Gréta; Bánréti, Zoltán; Pákáski, Magdolna; Kálmán, János
2018-01-01
Background: Even today the reliable diagnosis of the prodromal stages of Alzheimer’s disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive de-cline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Methods: Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech sig-nals, first manually (using the Praat software), and then automatically, with an automatic speech recogni-tion (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. Results: The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process – that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. Conclusion: The temporal analysis of spontaneous speech can be exploited in implementing a new, auto-matic detection-based tool for screening MCI for the community. PMID:29165085
Toth, Laszlo; Hoffmann, Ildiko; Gosztolya, Gabor; Vincze, Veronika; Szatloczki, Greta; Banreti, Zoltan; Pakaski, Magdolna; Kalman, Janos
2018-01-01
Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech signals, first manually (using the Praat software), and then automatically, with an automatic speech recognition (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process - that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. The temporal analysis of spontaneous speech can be exploited in implementing a new, automatic detection-based tool for screening MCI for the community. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Isaacson, M D; Srinivasan, S; Lloyd, L L
2010-01-01
MathSpeak is a set of rules for non speaking of mathematical expressions. These rules have been incorporated into a computerised module that translates printed mathematics into the non-ambiguous MathSpeak form for synthetic speech rendering. Differences between individual utterances produced with the translator module are difficult to discern because of insufficient pausing between utterances; hence, the purpose of this study was to develop an algorithm for improving the synthetic speech rendering of MathSpeak. To improve synthetic speech renderings, an algorithm for inserting pauses was developed based upon recordings of middle and high school math teachers speaking mathematic expressions. Efficacy testing of this algorithm was conducted with college students without disabilities and high school/college students with visual impairments. Parameters measured included reception accuracy, short-term memory retention, MathSpeak processing capacity and various rankings concerning the quality of synthetic speech renderings. All parameters measured showed statistically significant improvements when the algorithm was used. The algorithm improves the quality and information processing capacity of synthetic speech renderings of MathSpeak. This increases the capacity of individuals with print disabilities to perform mathematical activities and to successfully fulfill science, technology, engineering and mathematics academic and career objectives.
Hyperarticulation in Lombard speech: Global coordination of the jaw, lips and the tongue.
Šimko, Juraj; Beňuš, Štefan; Vainio, Martti
2016-01-01
Over the last century, researchers have collected a considerable amount of data reflecting the properties of Lombard speech, i.e., speech in a noisy environment. The documented phenomena predominately report effects on the speech signal produced in ambient noise. In comparison, relatively little is known about the underlying articulatory patterns of Lombard speech, in particular for lingual articulation. Here the authors present an analysis of articulatory recordings of speech material in babble noise of different intensity levels and in hypoarticulated speech and report quantitative differences in relative expansion of movement of different articulatory subsystems (the jaw, the lips and the tongue) as well as in relative expansion of utterance duration. The trajectory modifications for one articulator can be relatively reliably predicted by those for another one, but subsystems differ in a degree of continuity in trajectory expansion elicited across different noise levels. Regression analysis of articulatory modifications against durational expansion shows further qualitative differences between the subsystems, namely, the jaw and the tongue. The findings are discussed in terms of possible influences of a combination of prosodic, segmental, and physiological factors. In addition, the Lombard effect is put forward as a viable methodology for eliciting global articulatory variation in a controlled manner.
Tone classification of syllable-segmented Thai speech based on multilayer perception
NASA Astrophysics Data System (ADS)
Satravaha, Nuttavudh; Klinkhachorn, Powsiri; Lass, Norman
2002-05-01
Thai is a monosyllabic tonal language that uses tone to convey lexical information about the meaning of a syllable. Thus to completely recognize a spoken Thai syllable, a speech recognition system not only has to recognize a base syllable but also must correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system. Thai has five distinctive tones (``mid,'' ``low,'' ``falling,'' ``high,'' and ``rising'') and each tone is represented by a single fundamental frequency (F0) pattern. However, several factors, including tonal coarticulation, stress, intonation, and speaker variability, affect the F0 pattern of a syllable in continuous Thai speech. In this study, an efficient method for tone classification of syllable-segmented Thai speech, which incorporates the effects of tonal coarticulation, stress, and intonation, as well as a method to perform automatic syllable segmentation, were developed. Acoustic parameters were used as the main discriminating parameters. The F0 contour of a segmented syllable was normalized by using a z-score transformation before being presented to a tone classifier. The proposed system was evaluated on 920 test utterances spoken by 8 speakers. A recognition rate of 91.36% was achieved by the proposed system.
Linguistic Structures in Stereotyped Aphasic Speech
ERIC Educational Resources Information Center
Buckingham, Hugh W., Jr.; And Others
1975-01-01
The linguistic structure of specific introductory type clauses, which appear at a relatively high frequency in the utterances of a severely brain damaged fluent aphasic with neologistic jargon speech, is examined. The analysis is restricted to one fifty-six-year-old male patient who suffered massive subdural hematoma. (SCC)
Kaipa, Ramesh; Jones, Richard D; Robb, Michael P
2016-07-01
The benefits of different practice conditions in limb-based rehabilitation of motor disorders are well documented. Conversely, the role of practice structure in the treatment of motor-based speech disorders has only been minimally investigated. Considering this limitation, the current study aimed to investigate the effectiveness of selected practice conditions in spatial and temporal learning of novel speech utterances in individuals with Parkinson's disease (PD). Participants included 16 individuals with PD who were randomly and equally assigned to constant, variable, random, and blocked practice conditions. Participants in all four groups practiced a speech phrase for two consecutive days, and reproduced the speech phrase on the third day without further practice or feedback. There were no significant differences (p > 0.05) between participants across the four practice conditions with respect to either spatial or temporal learning of the speech phrase. Overall, PD participants demonstrated diminished spatial and temporal learning in comparison to healthy controls. Tests of strength of association between participants' demographic/clinical characteristics and speech-motor learning outcomes did not reveal any significant correlations. The findings from the current study suggest that repeated practice facilitates speech-motor learning in individuals with PD irrespective of the type of practice. Clinicians need to be cautious in applying practice conditions to treat speech deficits associated with PD based on the findings of non-speech-motor learning tasks. Copyright © 2016 Elsevier Ltd. All rights reserved.
Speech motor correlates of treatment-related changes in stuttering severity and speech naturalness.
Tasko, Stephen M; McClean, Michael D; Runyan, Charles M
2007-01-01
Participants of stuttering treatment programs provide an opportunity to evaluate persons who stutter as they demonstrate varying levels of fluency. Identifying physiologic correlates of altered fluency levels may lead to insights about mechanisms of speech disfluency. This study examined respiratory, orofacial kinematic and acoustic measures in 35 persons who stutter prior to and as they were completing a 1-month intensive stuttering treatment program. Participants showed a marked reduction in stuttering severity as they completed the treatment program. Coincident with reduced stuttering severity, participants increased the amplitude and duration of speech breaths, reduced the rate of lung volume change during inspiration, reduced the amplitude and speed of lip movements early in the test utterance, increased lip and jaw movement durations, and reduced syllable rate. A multiple regression model that included two respiratory measures and one orofacial kinematic measure accounted for 62% of the variance in changes in stuttering severity. Finally, there was a weak but significant tendency for speech of participants with the largest reductions in stuttering severity to be rated as more unnatural as they completed the treatment program.
Marks, Nicola J
2014-07-01
Scientists play an important role in framing public engagement with science. Their language can facilitate or impede particular interactions taking place with particular citizens: scientists' "speech acts" can "perform" different types of "scientific citizenship". This paper examines how scientists in Australia talked about therapeutic cloning during interviews and during the 2006 parliamentary debates on stem cell research. Some avoided complex labels, thereby facilitating public examination of this field. Others drew on language that only opens a space for publics to become educated, not to participate in a more meaningful way. Importantly, public utterances made by scientists here contrast with common international utterances: they did not focus on the therapeutic but the research promises of therapeutic cloning. Social scientists need to pay attention to the performative aspects of language in order to promote genuine citizen involvement in techno-science. Speech Act Theory is a useful analytical tool for this.
The stop voicing contrast in French: From citation speech to sentencial speech
NASA Astrophysics Data System (ADS)
Abdelli-Beruh, Nassima; Demaio, Eileen; Hisagi, Miwako
2004-05-01
This study explores the influence of speaking style on the salience of the acoustic correlates to the stop voicing distinction in French. Monolingual French speakers produced twenty-one C_vC_ syllables in citation speech, in minimal pairs and in sentence-length utterances (/pa/-/a/ context: /il a di pa C_vC_ a lui/; /pas/-/s/ context: /il a di pas C_vC_ sa~ lui/). Prominent stress was on the C_vC_. Voicing-related differences in percentages of closure voicing, durations of aspiration, closure, and vowel were analyzed as a function of these three speaking styles. Results show that the salience of the acoustic-phonetic segments present when the syllables are uttered in isolation or in minimal pairs is different than when the syllables are spoken in a sentence. These results are in agreement with findings in English.
Stuttering on function words in bilingual children who stutter: A preliminary study.
Gkalitsiou, Zoi; Byrd, Courtney T; Bedore, Lisa M; Taliancich-Klinger, Casey L
2017-01-01
Evidence suggests young monolingual children who stutter (CWS) are more disfluent on function than content words, particularly when produced in the initial utterance position. The purpose of the present preliminary study was to investigate whether young bilingual CWS present with this same pattern. The narrative and conversational samples of four bilingual Spanish- and English-speaking CWS were analysed. All four bilingual participants produced significantly more stuttering on function words compared to content words, irrespective of their position in the utterance, in their Spanish narrative and conversational speech samples. Three of the four participants also demonstrated more stuttering on function compared to content words in their narrative speech samples in English, but only one participant produced more stuttering on function than content words in her English conversational sample. These preliminary findings are discussed relative to linguistic planning and language proficiency and their potential contribution to stuttered speech.
Imitative Production of Rising Speech Intonation in Pediatric Cochlear Implant Recipients
ERIC Educational Resources Information Center
Peng, Shu-Chen; Tomblin, J. Bruce; Spencer, Linda J.; Hurtig, Richard R.
2007-01-01
Purpose: This study investigated the acoustic characteristics of pediatric cochlear implant (CI) recipients' imitative production of rising speech intonation, in relation to the perceptual judgments by listeners with normal hearing (NH). Method: Recordings of a yes-no interrogative utterance imitated by 24 prelingually deafened children with a CI…
Effects of Utterance Length on Lip Kinematics in Aphasia
ERIC Educational Resources Information Center
Bose, Arpita; van Lieshout, Pascal
2008-01-01
Most existing models of language production and speech motor control do not explicitly address how language requirements affect speech motor functions, as these domains are usually treated as separate and independent from one another. This investigation compared lip movements during bilabial closure between five individuals with mild aphasia and…
Prosody Production and Perception with Conversational Speech
ERIC Educational Resources Information Center
Mo, Yoonsook
2010-01-01
Speech utterances are more than the linear concatenation of individual phonemes or words. They are organized by prosodic structures comprising phonological units of different sizes (e.g., syllable, foot, word, and phrase) and the prominence relations among them. As the linguistic structure of spoken languages, prosody serves an important function…
ERIC Educational Resources Information Center
Anand, Supraja; Stepp, Cara E.
2015-01-01
Purpose: Given the potential significance of speech naturalness to functional and social rehabilitation outcomes, the objective of this study was to examine the effect of listener perceptions of monopitch on speech naturalness and intelligibility in individuals with Parkinson's disease (PD). Method: Two short utterances were extracted from…
Speech Accommodation without Priming: The Case of Pitch
ERIC Educational Resources Information Center
Gijssels, Tom; Casasanto, Laura Staum; Jasmin, Kyle; Hagoort, Peter; Casasanto, Daniel
2016-01-01
People often accommodate to each other's speech by aligning their linguistic production with their partner's. According to an influential theory, the Interactive Alignment Model, alignment is the result of priming. When people perceive an utterance, the corresponding linguistic representations are primed and become easier to produce. Here we…
Schwartz, Jean-Luc; Savariaux, Christophe
2014-01-01
An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call “preparatory gestures”. However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call “comodulatory gestures” providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction. PMID:25079216
Dilley, Laura C.; Wieland, Elizabeth A.; Gamache, Jessica L.; McAuley, J. Devin; Redford, Melissa A.
2013-01-01
Purpose As children mature, changes in voice spectral characteristics covary with changes in speech, language, and behavior. Spectral characteristics were manipulated to alter the perceived ages of talkers’ voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Method Speech was modified by lowering formants and fundamental frequency, for 5-year-old children’s utterances, or raising them, for adult caregivers’ utterances. Next, participants differing in awareness of the manipulation (Exp. 1a) or amount of speech-language training (Exp. 1b) made judgments of prosodic, segmental, and talker attributes. Exp. 2 investigated the effects of spectral modification on intelligibility. Finally, in Exp. 3 trained analysts used formal prosody coding to assess prosodic characteristics of spectrally-modified and unmodified speech. Results Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Conclusions Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work. PMID:23275414
Saltuklaroglu, Tim; Kalinowski, Joseph; Robbins, Mary; Crawcour, Stephen; Bowers, Andrew
2009-01-01
Stuttering is prone to strike during speech initiation more so than at any other point in an utterance. The use of auditory feedback (AAF) has been found to produce robust decreases in the stuttering frequency by creating an electronic rendition of choral speech (i.e., speaking in unison). However, AAF requires users to self-initiate speech before it can go into effect and, therefore, it might not be as helpful as true choral speech during speech initiation. To examine how AAF and choral speech differentially enhance fluency during speech initiation and in subsequent portions of utterances. Ten participants who stuttered read passages without altered feedback (NAF), under four AAF conditions and under a true choral speech condition. Each condition was blocked into ten 10 s trials separated by 5 s intervals so each trial required 'cold' speech initiation. In the first analysis, comparisons of stuttering frequencies were made across conditions. A second, finer grain analysis involved examining stuttering frequencies on the initial syllable, the subsequent four syllables produced and the five syllables produced immediately after the midpoint of each trial. On average, AAF reduced stuttering by approximately 68% relative to the NAF condition. Stuttering frequencies on the initial syllables were considerably higher than on the other syllables analysed (0.45 and 0.34 for NAF and AAF conditions, respectively). After the first syllable was produced, stuttering frequencies dropped precipitously and remained stable. However, this drop in stuttering frequency was significantly greater (approximately 84%) in the AAF conditions than in the NAF condition (approximately 66%) with frequencies on the last nine syllables analysed averaging 0.15 and 0.05 for NAF and AAF conditions, respectively. In the true choral speech condition, stuttering was virtually (approximately 98%) eliminated across all utterances and all syllable positions. Altered auditory feedback effectively inhibits stuttering immediately after speech has been initiated. However, unlike a true choral signal, which is exogenously initiated and offers the most complete fluency enhancement, AAF requires speech to be initiated by the user and 'fed back' before it can directly inhibit stuttering. It is suggested that AAF can be a viable clinical option for those who stutter and should often be used in combination with therapeutic techniques, particularly those that aid speech initiation. The substantially higher rate of stuttering occurring on initiation supports a hypothesis that overt stuttering events help 'release' and 'inhibit' central stuttering blocks. This perspective is examined in the context of internal models and mirror neurons.
Utterance-final position and pitch marking aid word learning in school-age children
Laaha, Sabine; Fitch, W. Tecumseh
2017-01-01
We investigated the effects of word order and prosody on word learning in school-age children. Third graders viewed photographs belonging to one of three semantic categories while hearing four-word nonsense utterances containing a target word. In the control condition, all words had the same pitch and, across trials, the position of the target word was varied systematically within each utterance. The only cue to word–meaning mapping was the co-occurrence of target words and referents. This cue was present in all conditions. In the Utterance-final condition, the target word always occurred in utterance-final position, and at the same fundamental frequency as all the other words of the utterance. In the Pitch peak condition, the position of the target word was varied systematically within each utterance across trials, and produced with pitch contrasts typical of infant-directed speech (IDS). In the Pitch peak + Utterance-final condition, the target word always occurred in utterance-final position, and was marked with a pitch contrast typical of IDS. Word learning occurred in all conditions except the control condition. Moreover, learning performance was significantly higher than that observed with simple co-occurrence (control condition) only for the Pitch peak + Utterance-final condition. We conclude that, for school-age children, the combination of words' utterance-final alignment and pitch enhancement boosts word learning. PMID:28878961
Utterance-final position and pitch marking aid word learning in school-age children.
Filippi, Piera; Laaha, Sabine; Fitch, W Tecumseh
2017-08-01
We investigated the effects of word order and prosody on word learning in school-age children. Third graders viewed photographs belonging to one of three semantic categories while hearing four-word nonsense utterances containing a target word. In the control condition, all words had the same pitch and, across trials, the position of the target word was varied systematically within each utterance. The only cue to word-meaning mapping was the co-occurrence of target words and referents. This cue was present in all conditions. In the Utterance-final condition, the target word always occurred in utterance-final position, and at the same fundamental frequency as all the other words of the utterance. In the Pitch peak condition, the position of the target word was varied systematically within each utterance across trials, and produced with pitch contrasts typical of infant-directed speech (IDS). In the Pitch peak + Utterance-final condition, the target word always occurred in utterance-final position, and was marked with a pitch contrast typical of IDS. Word learning occurred in all conditions except the control condition. Moreover, learning performance was significantly higher than that observed with simple co-occurrence ( control condition) only for the Pitch peak + Utterance-final condition. We conclude that, for school-age children, the combination of words' utterance-final alignment and pitch enhancement boosts word learning.
A model of serial order problems in fluent, stuttered and agrammatic speech.
Howell, Peter
2007-10-01
Many models of speech production have attempted to explain dysfluent speech. Most models assume that the disruptions that occur when speech is dysfluent arise because the speakers make errors while planning an utterance. In this contribution, a model of the serial order of speech is described that does not make this assumption. It involves the coordination or 'interlocking' of linguistic planning and execution stages at the language-speech interface. The model is examined to determine whether it can distinguish two forms of dysfluent speech (stuttered and agrammatic speech) that are characterized by iteration and omission of whole words and parts of words.
Start-up rhetoric in eight speeches of Barack Obama.
O'Connell, Daniel C; Kowal, Sabine; Sabin, Edward J; Lamia, John F; Dannevik, Margaret
2010-10-01
Our purpose in the following was to investigate the start-up rhetoric employed by U.S. President Barack Obama in his speeches. The initial 5 min from eight of his speeches from May to September of 2009 were selected for their variety of setting, audience, theme, and purpose. It was generally hypothesized that Barack Obama, widely recognized for the excellence of his rhetorical performance, would pursue both constant and variable strategies in his effort to establish contact with his audience. More specifically, it was hypothesized that the make-up of the audience--primarily native or non-native speakers of English--would be a prominent independent variable. A number of temporal and verbal measures were used as dependent variables. Variations were evident in mean length in syllables and duration in seconds of utterances (articulatory phrases), articulation rate in syllables per second of ontime, mean duration of silent pauses in seconds, and frequency of fillers, hesitations, colloquial words and phrases, introductory phrases, and 1st person singular pronominals. Results indicated that formality versus informality of the setting and presence or absence of a teleprompter were more prominent than native versus non-native audiences. Our analyses confirm Obama's skillfulness in challenging and variable settings and clearly detect orderliness and scientific generalizability in language use. The concept of orality/literacy provides a theoretical background and emphasizes dialogical interaction of audience and speaker.
Language Sampling for Preschoolers With Severe Speech Impairments
Ragsdale, Jamie; Bustos, Aimee
2016-01-01
Purpose The purposes of this investigation were to determine if measures such as mean length of utterance (MLU) and percentage of comprehensible words can be derived reliably from language samples of children with severe speech impairments and if such measures correlate with tools that measure constructs assumed to be related. Method Language samples of 15 preschoolers with severe speech impairments (but receptive language within normal limits) were transcribed independently by 2 transcribers. Nonparametric statistics were used to determine which measures, if any, could be transcribed reliably and to determine if correlations existed between language sample measures and standardized measures of speech, language, and cognition. Results Reliable measures were extracted from the majority of the language samples, including MLU in words, mean number of syllables per utterance, and percentage of comprehensible words. Language sample comprehensibility measures were correlated with a single word comprehensibility task. Also, language sample MLUs and mean length of the participants' 3 longest sentences from the MacArthur–Bates Communicative Development Inventory (Fenson et al., 2006) were correlated. Conclusion Language sampling, given certain modifications, may be used for some 3-to 5-year-old children with normal receptive language who have severe speech impairments to provide reliable expressive language and comprehensibility information. PMID:27552110
Language Sampling for Preschoolers With Severe Speech Impairments.
Binger, Cathy; Ragsdale, Jamie; Bustos, Aimee
2016-11-01
The purposes of this investigation were to determine if measures such as mean length of utterance (MLU) and percentage of comprehensible words can be derived reliably from language samples of children with severe speech impairments and if such measures correlate with tools that measure constructs assumed to be related. Language samples of 15 preschoolers with severe speech impairments (but receptive language within normal limits) were transcribed independently by 2 transcribers. Nonparametric statistics were used to determine which measures, if any, could be transcribed reliably and to determine if correlations existed between language sample measures and standardized measures of speech, language, and cognition. Reliable measures were extracted from the majority of the language samples, including MLU in words, mean number of syllables per utterance, and percentage of comprehensible words. Language sample comprehensibility measures were correlated with a single word comprehensibility task. Also, language sample MLUs and mean length of the participants' 3 longest sentences from the MacArthur-Bates Communicative Development Inventory (Fenson et al., 2006) were correlated. Language sampling, given certain modifications, may be used for some 3-to 5-year-old children with normal receptive language who have severe speech impairments to provide reliable expressive language and comprehensibility information.
NASA Astrophysics Data System (ADS)
Monroe, Roberta Lynn
The intrinsic fundamental frequency effect among vowels is a vocalic phenomenon of adult speech in which high vowels have higher fundamental frequencies in relation to low vowels. Acoustic investigations of children's speech have shown that variability of the speech signal decreases as children's ages increase. Fundamental frequency measures have been suggested as an indirect metric for the development of laryngeal stability and coordination. Studies of the intrinsic fundamental frequency effect have been conducted among 8- and 9-year old children and in infants. The present study investigated this effect among 2- and 4-year old children. Eight 2-year old and eight 4-year old children produced four vowels, /ae/, /i/, /u/, and /a/, in CVC syllables. Three measures of fundamental frequency were taken. These were mean fundamental frequency, the intra-utterance standard deviation of the fundamental frequency, and the extent to which the cycle-to-cycle pattern of the fundamental frequency was predicted by a linear trend. An analysis of variance was performed to compare the two age groups, the four vowels, and the earlier and later repetitions of the CVC syllables. A significant difference between the two age groups was detected using the intra-utterance standard deviation of the fundamental frequency. Mean fundamental frequencies and linear trend analysis showed that voicing of the preceding consonant determined the statistical significance of the age-group comparisons. Statistically significant differences among the fundamental frequencies of the four vowels were not detected for either age group.
ERIC Educational Resources Information Center
Yuan, Bin
2018-01-01
The current research is mainly conducted to explore the pragmatic functions of English rhetoric in public speech. To do this, methods of close reading and case studies are adopted. The research first reveals that the boom of public speech programs helps reexamine the art of utterance, during the delivery of which English rhetoric plays an…
ERIC Educational Resources Information Center
Blount, Ben G.; Padgug, Elise J.
Features of parental speech to young children was studied in four English-speaking and four Spanish-speaking families. Children ranged in age from 9 to 12 months for the English speakers and from 8 to 22 months for the Spanish speakers. Examination of the utterances led to the identification of 34 prosodic, paralinguistic, and interactional…
ERIC Educational Resources Information Center
Horgan, Dianne
A study was conducted to determine whether the child expresses linguistic knowledge during the single-word period. The order of mention in 65 sets of successive single-word utterances from five children at Stage 1, two to four years old, were analyzed. To elicit speech, the children were shown line drawings representing such situations as animate…
ERIC Educational Resources Information Center
Buium, Nissan; And Others
Speech samples were collected from three 48-month-old children with Down's Syndrome over an 11-month period after Ss had reached the one word utterance stage. Each S's linguistic utterances were semantically evaluated in terms of M. Bowerman's, R. Brown's, and I. Schlesinger's semantic relational concepts. Generally, findings suggested that Ss…
Prosodic Temporal Alignment of Co-Speech Gestures to Speech Facilitates Referent Resolution
ERIC Educational Resources Information Center
Jesse, Alexandra; Johnson, Elizabeth K.
2012-01-01
Using a referent detection paradigm, we examined whether listeners can determine the object speakers are referring to by using the temporal alignment between the motion speakers impose on objects and their labeling utterances. Stimuli were created by videotaping speakers labeling a novel creature. Without being explicitly instructed to do so,…
Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception
ERIC Educational Resources Information Center
Wilson, Amanda H.; Alsius, Agnès; Parè, Martin; Munhall, Kevin G.
2016-01-01
Purpose: The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. Method: We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent…
ERIC Educational Resources Information Center
Zheng, Chun
2017-01-01
Producing a sensible utterance requires speakers to select conceptual content, lexical items, and syntactic structures almost instantaneously during speech planning. Each language offers its speakers flexibility in the selection of lexical and syntactic options to talk about the same scenarios involving movement. Languages also vary typologically…
The Listener: No Longer the Silent Partner in Reduced Intelligibility
ERIC Educational Resources Information Center
Zielinski, Beth W.
2008-01-01
In this study I investigate the impact of different characteristics of the L2 speech signal on the intelligibility of L2 speakers of English to native listeners. Three native listeners were observed and questioned as they orthographically transcribed utterances taken from connected conversational speech produced by three L2 speakers from different…
ERIC Educational Resources Information Center
Watson, Jennifer B.; Byrd, Courtney T.; Carlo, Edna J.
2011-01-01
Purpose: To explore the effects of utterance length, syntactic complexity, and grammatical correctness on stuttering in the spontaneous speech of young, monolingual Spanish-speaking children. Method: Spontaneous speech samples of 11 monolingual Spanish-speaking children who stuttered, ages 35 to 70 months, were examined. Mean number of syllables,…
Turn-taking: From perception to speech preparation.
Wesselmeier, Hendrik; Müller, Horst M
2015-11-16
We investigated the preparation of a spoken answer response to interrogative sentences by measuring response time (RT) and the response-related readiness potential (RP). By comparing the RT and RP results we aimed to identify whether the RP-onset is more related to the actual speech preparation process or the pure intention to speak after turn-anticipation. Additionally, we investigated if the RP-onset can be influenced by the syntactic structure (one or two completion points). Therefore, the EEG data were sorted based on two variables: the cognitive load required for the response and the syntactic structure of the stimulus questions. The results of the response utterance preparation associated event-related potential (ERP) and the RT suggest that the RP-onset is more related to the actual speech preparation process rather than the pure intention to speak after turn-anticipation. However, the RP-onset can be influenced by the syntactic structure of the question leading to an early response preparation. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Restoring the missing features of the corrupted speech using linear interpolation methods
NASA Astrophysics Data System (ADS)
Rassem, Taha H.; Makbol, Nasrin M.; Hasan, Ali Muttaleb; Zaki, Siti Syazni Mohd; Girija, P. N.
2017-10-01
One of the main challenges in the Automatic Speech Recognition (ASR) is the noise. The performance of the ASR system reduces significantly if the speech is corrupted by noise. In spectrogram representation of a speech signal, after deleting low Signal to Noise Ratio (SNR) elements, the incomplete spectrogram is obtained. In this case, the speech recognizer should make modifications to the spectrogram in order to restore the missing elements, which is one direction. In another direction, speech recognizer should be able to restore the missing elements due to deleting low SNR elements before performing the recognition. This is can be done using different spectrogram reconstruction methods. In this paper, the geometrical spectrogram reconstruction methods suggested by some researchers are implemented as a toolbox. In these geometrical reconstruction methods, the linear interpolation along time or frequency methods are used to predict the missing elements between adjacent observed elements in the spectrogram. Moreover, a new linear interpolation method using time and frequency together is presented. The CMU Sphinx III software is used in the experiments to test the performance of the linear interpolation reconstruction method. The experiments are done under different conditions such as different lengths of the window and different lengths of utterances. Speech corpus consists of 20 males and 20 females; each one has two different utterances are used in the experiments. As a result, 80% recognition accuracy is achieved with 25% SNR ratio.
Nakayama, Masataka; Saito, Satoru
2015-08-01
The present study investigated principles of phonological planning, a common serial ordering mechanism for speech production and phonological short-term memory. Nakayama and Saito (2014) have investigated the principles by using a speech-error induction technique, in which participants were exposed to an auditory distracIor word immediately before an utterance of a target word. They demonstrated within-word adjacent mora exchanges and serial position effects on error rates. These findings support, respectively, the temporal distance and the edge principles at a within-word level. As this previous study induced errors using word distractors created by exchanging adjacent morae in the target words, it is possible that the speech errors are expressions of lexical intrusions reflecting interactive activation of phonological and lexical/semantic representations. To eliminate this possibility, the present study used nonword distractors that had no lexical or semantic representations. This approach successfully replicated the error patterns identified in the abovementioned study, further confirming that the temporal distance and edge principles are organizing precepts in phonological planning.
Letter Knowledge in Parent–Child Conversations
Robins, Sarah; Treiman, Rebecca; Rosales, Nicole
2014-01-01
Learning about letters is an important component of emergent literacy. We explored the possibility that parent speech provides information about letters, and also that children’s speech reflects their own letter knowledge. By studying conversations transcribed in CHILDES (MacWhinney, 2000) between parents and children aged one to five, we found that alphabetic order influenced use of individual letters and letter sequences. The frequency of letters in children’s books influenced parent utterances throughout the age range studied, but children’s utterances only after age two. Conversations emphasized some literacy-relevant features of letters, such as their shapes and association with words, but not letters’ sounds. Describing these patterns and how they change over the preschool years offers important insight into the home literacy environment. PMID:25598577
NASA Astrophysics Data System (ADS)
Přibil, Jiří; Přibilová, Anna; Ďuračkoá, Daniela
2014-01-01
The paper describes our experiment with using the Gaussian mixture models (GMM) for classification of speech uttered by a person wearing orthodontic appliances. For the GMM classification, the input feature vectors comprise the basic and the complementary spectral properties as well as the supra-segmental parameters. Dependence of classification correctness on the number of the parameters in the input feature vector and on the computation complexity is also evaluated. In addition, an influence of the initial setting of the parameters for GMM training process was analyzed. Obtained recognition results are compared visually in the form of graphs as well as numerically in the form of tables and confusion matrices for tested sentences uttered using three configurations of orthodontic appliances.
Two different phenomena in basic motor speech performance in premanifest Huntington disease.
Skodda, Sabine; Grönheit, Wenke; Lukas, Carsten; Bellenberg, Barbara; von Hein, Sarah M; Hoffmann, Rainer; Saft, Carsten
2016-03-09
Dysarthria is a common feature in Huntington disease (HD). The aim of this cross-sectional pilot study was the description and objective analysis of different speech parameters with special emphasis on the aspect of speech timing of connected speech and nonspeech verbal utterances in premanifest HD (preHD). A total of 28 preHD mutation carriers and 28 age- and sex-matched healthy speakers had to perform a reading task and several syllable repetition tasks. Results of computerized acoustic analysis of different variables for the measurement of speech rate and regularity were correlated with clinical measures and MRI-based brain atrophy assessment by voxel-based morphometry. An impaired capacity to steadily repeat single syllables with higher variations in preHD compared to healthy controls was found (variance 1: Cohen d = 1.46). Notably, speech rate was increased compared to controls and showed correlations to the volume of certain brain areas known to be involved in the sensory-motor speech networks (net speech rate: Cohen d = 1.19). Furthermore, speech rate showed correlations to disease burden score, probability of disease onset, the estimated years to onset, and clinical measures like the cognitive score. Measurement of speech rate and regularity might be helpful additional tools for the monitoring of subclinical functional disability in preHD. As one of the possible causes for higher performance in preHD, we discuss huntingtin-dependent temporarily advantageous development processes of the brain. © 2016 American Academy of Neurology.
De Jonge-Hoekstra, Lisette; Van der Steen, Steffie; Van Geert, Paul; Cox, Ralf F A
2016-01-01
As children learn they use their speech to express words and their hands to gesture. This study investigates the interplay between real-time gestures and speech as children construct cognitive understanding during a hands-on science task. 12 children (M = 6, F = 6) from Kindergarten (n = 5) and first grade (n = 7) participated in this study. Each verbal utterance and gesture during the task were coded, on a complexity scale derived from dynamic skill theory. To explore the interplay between speech and gestures, we applied a cross recurrence quantification analysis (CRQA) to the two coupled time series of the skill levels of verbalizations and gestures. The analysis focused on (1) the temporal relation between gestures and speech, (2) the relative strength and direction of the interaction between gestures and speech, (3) the relative strength and direction between gestures and speech for different levels of understanding, and (4) relations between CRQA measures and other child characteristics. The results show that older and younger children differ in the (temporal) asymmetry in the gestures-speech interaction. For younger children, the balance leans more toward gestures leading speech in time, while the balance leans more toward speech leading gestures for older children. Secondly, at the group level, speech attracts gestures in a more dynamically stable fashion than vice versa, and this asymmetry in gestures and speech extends to lower and higher understanding levels. Yet, for older children, the mutual coupling between gestures and speech is more dynamically stable regarding the higher understanding levels. Gestures and speech are more synchronized in time as children are older. A higher score on schools' language tests is related to speech attracting gestures more rigidly and more asymmetry between gestures and speech, only for the less difficult understanding levels. A higher score on math or past science tasks is related to less asymmetry between gestures and speech. The picture that emerges from our analyses suggests that the relation between gestures, speech and cognition is more complex than previously thought. We suggest that temporal differences and asymmetry in influence between gestures and speech arise from simultaneous coordination of synergies.
The redeployment of attention to the mouth of a talking face during the second year of life.
Hillairet de Boisferon, Anne; Tift, Amy H; Minar, Nicholas J; Lewkowicz, David J
2018-08-01
Previous studies have found that when monolingual infants are exposed to a talking face speaking in a native language, 8- and 10-month-olds attend more to the talker's mouth, whereas 12-month-olds no longer do so. It has been hypothesized that the attentional focus on the talker's mouth at 8 and 10 months of age reflects reliance on the highly salient audiovisual (AV) speech cues for the acquisition of basic speech forms and that the subsequent decline of attention to the mouth by 12 months of age reflects the emergence of basic native speech expertise. Here, we investigated whether infants may redeploy their attention to the mouth once they fully enter the word-learning phase. To test this possibility, we recorded eye gaze in monolingual English-learning 14- and 18-month-olds while they saw and heard a talker producing an English or Spanish utterance in either an infant-directed (ID) or adult-directed (AD) manner. Results indicated that the 14-month-olds attended more to the talker's mouth than to the eyes when exposed to the ID utterance and that the 18-month-olds attended more to the talker's mouth when exposed to the ID and the AD utterance. These results show that infants redeploy their attention to a talker's mouth when they enter the word acquisition phase and suggest that infants rely on the greater perceptual salience of redundant AV speech cues to acquire their lexicon. Copyright © 2018 Elsevier Inc. All rights reserved.
Yahata, Izumi; Kawase, Tetsuaki; Kanno, Akitake; Hidaka, Hiroshi; Sakamoto, Shuichi; Nakasato, Nobukazu; Kawashima, Ryuta; Katori, Yukio
2017-01-01
The effects of visual speech (the moving image of the speaker's face uttering speech sound) on early auditory evoked fields (AEFs) were examined using a helmet-shaped magnetoencephalography system in 12 healthy volunteers (9 males, mean age 35.5 years). AEFs (N100m) in response to the monosyllabic sound /be/ were recorded and analyzed under three different visual stimulus conditions, the moving image of the same speaker's face uttering /be/ (congruent visual stimuli) or uttering /ge/ (incongruent visual stimuli), and visual noise (still image processed from speaker's face using a strong Gaussian filter: control condition). On average, latency of N100m was significantly shortened in the bilateral hemispheres for both congruent and incongruent auditory/visual (A/V) stimuli, compared to the control A/V condition. However, the degree of N100m shortening was not significantly different between the congruent and incongruent A/V conditions, despite the significant differences in psychophysical responses between these two A/V conditions. Moreover, analysis of the magnitudes of these visual effects on AEFs in individuals showed that the lip-reading effects on AEFs tended to be well correlated between the two different audio-visual conditions (congruent vs. incongruent visual stimuli) in the bilateral hemispheres but were not significantly correlated between right and left hemisphere. On the other hand, no significant correlation was observed between the magnitudes of visual speech effects and psychophysical responses. These results may indicate that the auditory-visual interaction observed on the N100m is a fundamental process which does not depend on the congruency of the visual information.
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.
Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu
2018-05-01
Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
Low-Income Fathers' Speech to Toddlers during Book Reading versus Toy Play
ERIC Educational Resources Information Center
Salo, Virginia C.; Rowe, Meredith L.; Leech, Kathryn A.; Cabrera, Natasha J.
2016-01-01
Fathers' child-directed speech across two contexts was examined. Father-child dyads from sixty-nine low-income families were videotaped interacting during book reading and toy play when children were 2;0. Fathers used more diverse vocabulary and asked more questions during book reading while their mean length of utterance was longer during toy…
ERIC Educational Resources Information Center
Shriberg, Lawrence D.; Paul, Rhea; McSweeny, Jane L.; Klin, Ami; Cohen, Donald J.; Volkmar, Fred R.
2001-01-01
This study compared the speech and prosody-voice profiles for 30 male speakers with either high-functioning autism (HFA) or Asperger syndrome (AS), and 53 typically developing male speakers. Both HFA and AS groups had more residual articulation distortion errors and utterances coded as inappropriate for phrasing, stress, and resonance. AS speakers…
Using the Self-Select Paradigm to Delineate the Nature of Speech Motor Programming
ERIC Educational Resources Information Center
Wright, David L.; Robin, Don A.; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H.; Fox, Peter T.
2009-01-01
Purpose: The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial…
How Children and Adults Produce and Perceive Uncertainty in Audiovisual Speech
ERIC Educational Resources Information Center
Krahmer, Emiel; Swerts, Marc
2005-01-01
We describe two experiments on signaling and detecting uncertainty in audiovisual speech by adults and children. In the first study, utterances from adult speakers and child speakers (aged 7-8) were elicited and annotated with a set of six audiovisual features. It was found that when adult speakers were uncertain they were more likely to produce…
A Longitudinal Investigation of Morpho-Syntax in Children with Speech Sound Disorders
ERIC Educational Resources Information Center
Mortimer, Jennifer; Rvachew, Susan
2010-01-01
Purpose: The intent of this study was to examine the longitudinal morpho-syntactic progression of children with Speech Sound Disorders (SSD) grouped according to Mean Length of Utterance (MLU) scores. Methods: Thirty-seven children separated into four clusters were assessed in their pre-kindergarten and Grade 1 years. Cluster 1 were children with…
Age-Related Changes to Speech Breathing with Increased Vocal Loudness
ERIC Educational Resources Information Center
Huber, Jessica E.; Spruill, John, III
2008-01-01
Purpose: The present study examines the effect of normal aging on respiratory support for speech when utterance length is controlled. Method: Fifteen women (M = 71 years of age) and 10 men (M = 73 years of age) produced 2 sentences of different lengths in 4 loudness conditions while respiratory kinematics were measured. Measures included those…
ERIC Educational Resources Information Center
Kormos, Judit; Préfontaine, Yvonne
2017-01-01
The present mixed-methods study examined the role of learner appraisals of speech tasks in second language (L2) French fluency. Forty adult learners in a Canadian immersion program participated in the study that compared four sources of data: (1) objectively measured utterance fluency in participants' performances of three narrative tasks…
Teaching tone and intonation with the Prosody Workstation using schematic versus veridical contours
NASA Astrophysics Data System (ADS)
Allen, George D.; Eulenberg, John B.
2004-05-01
Prosodic features of speech (e.g., intonation and rhythm) are often challenging for adults to learn. Most computerized teaching tools, developed to help learners mimic model prosodic patterns, display lines representing the veridical (actual) acoustic fundamental frequency and intensity of the model speech. However, a veridical display may not be optimal for this task. Instead, stereotypical representations (e.g., simplified level or slanting lines) may help by reducing the amount of potentially distracting information. The Prosody Workstation (PW) permits the prosodic contours of both models and users' responses to be displayed using either veridical or stereotypical contours. Users are informed by both visual displays and scores representing the degree of match of their utterance to the model. American English-speaking undergraduates are being studied learning the tone contours and rhythm of Chinese and Hausa utterances ranging in length from two to six syllables. Data include (a) accuracy of mimicking of the models' prosodic contours, measured by the PW; (b) quality of tonal and rhythmic production, judged by native speaker listeners; and (c) learners' perceptions of the ease of the task, measured by a questionnaire at the end of each session.
Friedman, Ori; Neary, Karen R; Burnstein, Corinna L; Leslie, Alan M
2010-05-01
When young children observe pretend-play, do they interpret it simply as a type of behavior, or do they infer the underlying mental state that gives the behavior meaning? This is a long-standing question with deep implications for how "theory on mind" develops. The two leading accounts of shared pretense give opposing answers. The behavioral theory proposes that children represent pretense as a form of behavior (behaving in a way that would be appropriate if P); the metarepresentational theory argues that children instead represent pretense via the early concept PRETEND. A test between these accounts is provided by children's understanding of pretend sounds and speech. We report the first experiments directly investigating this understanding. In three experiments, 2- and 3-year-olds' listened to requests that were either spoken normally, or with the pretense that a teddy bear was uttering them. To correctly fulfill the requests, children had to represent the normal utterance as the experimenter's, and the pretend utterances as the bear's. Children succeeded at both ages, suggesting that they can represent pretend speech (the requests) as coming from counterfactual sources (the bear rather than the experimenter). We argue that this is readily explained by the metarepresentational theory, but harder to explain if children are behaviorists about pretense. Copyright 2010 Elsevier B.V. All rights reserved.
Chowdhury, Nafees Uddin; Otomaru, Takafumi; Murase, Mai; Inohara, Ken; Hattori, Mariko; Sumita, Yuka I; Taniguchi, Hisashi
2011-01-01
An objective assessment of speech would benefit the prosthetic rehabilitation of maxillectomy patients. This study aimed to establish a simple, objective evaluation of monosyllable /sa/ utterances in maxillectomy patients by using a psychoacoustic system typically used in industry. This study comprised two experiments. Experiment 1 involved analysis of the psychoacoustic parameters (loudness, sharpness and roughness) in monosyllable /sa/ utterances by 18 healthy subjects (9 males, 9 females). The utterances were recorded in a sound-treated room. The coefficient of variation (CV) for each parameter was compared to identify the most suitable parameter for objective evaluation of speech. Experiment 2 involved analysis of /sa/ utterances by 18 maxillectomy patients (9 males, 9 females) with and without prosthesis, and comparisons of the psychoacoustic data between the healthy subjects and maxillectomy patients without prosthesis, between the maxillectomy patients with and without prosthesis, and between the healthy subjects and maxillectomy patients with prosthesis. The CV for sharpness was the lowest among the three psychoacoustic parameters in both the healthy males and females. There were significant differences in the sharpness of /sa/ between the healthy subjects and the maxillectomy patients without prosthesis (but not with prosthesis), and between the maxillectomy patients with and without prosthesis. We found that the psychoacoustic parameters typically adopted in industrial research could also be applied to evaluate the psychoacoustics of the monosyllable /sa/ utterance, and distinguished the monosyllable /sa/ in maxillectomy patients with an obturator from that without an obturator using the system. Copyright © 2010 Japan Prosthodontic Society. Published by Elsevier Ltd. All rights reserved.
Language in boys with fragile X syndrome.
Levy, Yonata; Gottesman, Riki; Borochowitz, Zvi; Frydman, Moshe; Sagi, Michal
2006-02-01
The current paper reports of language production in 15 Hebrew-speaking boys, aged 9;0-13;0, with fully methylated, non-mosaic fragile X syndrome and no concomitant diagnosis of autism. Contrary to expectations, seven children were non-verbal. Language production in the verbal children was studied in free conversations and in context-bound speech. Despite extra caution in calculating MLU, participants' language level was not predicted by mean utterance length. Context bound speech resulted in grammatically more advanced performance than free conversation, and performance in both contexts differed in important ways from performance of typically developing MLU-matched controls. The relevance of MLU as a predictor of productive grammar in disordered populations is briefly discussed.
Self-, other-, and joint monitoring using forward models.
Pickering, Martin J; Garrod, Simon
2014-01-01
In the psychology of language, most accounts of self-monitoring assume that it is based on comprehension. Here we outline and develop the alternative account proposed by Pickering and Garrod (2013), in which speakers construct forward models of their upcoming utterances and compare them with the utterance as they produce them. We propose that speakers compute inverse models derived from the discrepancy (error) between the utterance and the predicted utterance and use that to modify their production command or (occasionally) begin anew. We then propose that comprehenders monitor other people's speech by simulating their utterances using covert imitation and forward models, and then comparing those forward models with what they hear. They use the discrepancy to compute inverse models and modify their representation of the speaker's production command, or realize that their representation is incorrect and may develop a new production command. We then discuss monitoring in dialogue, paying attention to sequential contributions, concurrent feedback, and the relationship between monitoring and alignment.
Self-, other-, and joint monitoring using forward models
Pickering, Martin J.; Garrod, Simon
2014-01-01
In the psychology of language, most accounts of self-monitoring assume that it is based on comprehension. Here we outline and develop the alternative account proposed by Pickering and Garrod (2013), in which speakers construct forward models of their upcoming utterances and compare them with the utterance as they produce them. We propose that speakers compute inverse models derived from the discrepancy (error) between the utterance and the predicted utterance and use that to modify their production command or (occasionally) begin anew. We then propose that comprehenders monitor other people’s speech by simulating their utterances using covert imitation and forward models, and then comparing those forward models with what they hear. They use the discrepancy to compute inverse models and modify their representation of the speaker’s production command, or realize that their representation is incorrect and may develop a new production command. We then discuss monitoring in dialogue, paying attention to sequential contributions, concurrent feedback, and the relationship between monitoring and alignment. PMID:24723869
Are You Talking to Me? Dialogue Systems Supporting Mixed Teams of Humans and Robots
NASA Technical Reports Server (NTRS)
Dowding, John; Clancey, William J.; Graham, Jeffrey
2006-01-01
This position paper describes an approach to building spoken dialogue systems for environments containing multiple human speakers and hearers, and multiple robotic speakers and hearers. We address the issue, for robotic hearers, of whether the speech they hear is intended for them, or more likely to be intended for some other hearer. We will describe data collected during a series of experiments involving teams of multiple human and robots (and other software participants), and some preliminary results for distinguishing robot-directed speech from human-directed speech. The domain of these experiments is Mars-analogue planetary exploration. These Mars-analogue field studies involve two subjects in simulated planetary space suits doing geological exploration with the help of 1-2 robots, supporting software agents, a habitat communicator and links to a remote science team. The two subjects are performing a task (geological exploration) which requires them to speak with each other while also speaking with their assistants. The technique used here is to use a probabilistic context-free grammar language model in the speech recognizer that is trained on prior robot-directed speech. Intuitively, the recognizer will give higher confidence to an utterance if it is similar to utterances that have been directed to the robot in the past.
Lexically restricted utterances in Russian, german, and english child-directed speech.
Stoll, Sabine; Abbot-Smith, Kirsten; Lieven, Elena
2009-01-01
This study investigates the child-directed speech (CDS) of four Russian-, six German, and six English-speaking mothers to their 2-year-old children. Typologically Russian has considerably less restricted word order than either German or English, with German showing more word-order variants than English. This could lead to the prediction that the lexical restrictiveness previously found in the initial strings of English CDS by Cameron-Faulkner, Lieven, and Tomasello (2003) would not be found in Russian or German CDS. However, despite differences between the three corpora that clearly derive from typological differences between the languages, the most significant finding of this study is a high degree of lexical restrictiveness at the beginnings of CDS utterances in all three languages. Copyright © 2009 Cognitive Science Society, Inc.
Automatic initial and final segmentation in cleft palate speech of Mandarin speakers
Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang
2017-01-01
The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with “quasi-unvoiced” or with “quasi-voiced” initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%. PMID:28926572
Automatic initial and final segmentation in cleft palate speech of Mandarin speakers.
He, Ling; Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang
2017-01-01
The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with "quasi-unvoiced" or with "quasi-voiced" initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%.
Look at the Gato! Code-Switching in Speech to Toddlers
ERIC Educational Resources Information Center
Bail, Amelie; Morini, Giovanna; Newman, Rochelle S.
2015-01-01
We examined code-switching (CS) in the speech of twenty-four bilingual caregivers when speaking with their 18- to 24-month-old children. All parents CS at least once in a short play session, and some code-switched quite often (over 1/3 of utterances). This CS included both inter-sentential and intra-sentential switches, suggesting that at least…
Keshtiari, Niloofar; Kuhlmann, Michael; Eslami, Moharram; Klann-Delius, Gisela
2015-03-01
Research on emotional speech often requires valid stimuli for assessing perceived emotion through prosody and lexical content. To date, no comprehensive emotional speech database for Persian is officially available. The present article reports the process of designing, compiling, and evaluating a comprehensive emotional speech database for colloquial Persian. The database contains a set of 90 validated novel Persian sentences classified in five basic emotional categories (anger, disgust, fear, happiness, and sadness), as well as a neutral category. These sentences were validated in two experiments by a group of 1,126 native Persian speakers. The sentences were articulated by two native Persian speakers (one male, one female) in three conditions: (1) congruent (emotional lexical content articulated in a congruent emotional voice), (2) incongruent (neutral sentences articulated in an emotional voice), and (3) baseline (all emotional and neutral sentences articulated in neutral voice). The speech materials comprise about 470 sentences. The validity of the database was evaluated by a group of 34 native speakers in a perception test. Utterances recognized better than five times chance performance (71.4 %) were regarded as valid portrayals of the target emotions. Acoustic analysis of the valid emotional utterances revealed differences in pitch, intensity, and duration, attributes that may help listeners to correctly classify the intended emotion. The database is designed to be used as a reliable material source (for both text and speech) in future cross-cultural or cross-linguistic studies of emotional speech, and it is available for academic research purposes free of charge. To access the database, please contact the first author.
Role of maternal gesture use in speech use by children with fragile X syndrome.
Hahn, Laura J; Zimmer, B Jean; Brady, Nancy C; Swinburne Romine, Rebecca E; Fleming, Kandace K
2014-05-01
The purpose of this study was to investigate how maternal gesture relates to speech production by children with fragile X syndrome (FXS). Participants were 27 young children with FXS (23 boys, 4 girls) and their mothers. Videotaped home observations were conducted between the ages of 25 and 37 months (toddler period) and again between the ages of 60 and 71 months (child period). The videos were later coded for types of maternal utterances and maternal gestures that preceded child speech productions. Children were also assessed with the Mullen Scales of Early Learning at both ages. Maternal gesture use in the toddler period was positively related to expressive language scores at both age periods and was related to receptive language scores in the child period. Maternal proximal pointing, in comparison to other gestures, evoked more speech responses from children during the mother-child interactions, particularly when combined with wh-questions. This study adds to the growing body of research on the importance of contextual variables, such as maternal gestures, in child language development. Parental gesture use may be an easily added ingredient to parent-focused early language intervention programs.
The Treatment of Non-Fluent Utterance--A Behavioral Approach
ERIC Educational Resources Information Center
Wohl, Maud T.
1970-01-01
Indicates that a behavioral approach, utilizing an electronic metronome in speech and motor activities, is highly effective in reducing the occurrence of stammering in children and adults. Bibliography. (RW)
Smith, Anne; Weber, Christine
2017-01-01
Purpose The purpose of this study was to determine if indices of speech motor coordination during the production of sentences varying in sentence length and syntactic complexity were associated with stuttering persistence versus recovery in 5- to 7-year-old children. Methods We compared children with persistent stuttering (CWS-Per) with children who had recovered (CWS-Rec), and children who do not stutter (CWNS). A kinematic measure of articulatory coordination, lip aperture variability (LAVar), and overall movement duration were computed for perceptually fluent sentence productions varying in length and syntactic complexity. Results CWS-Per exhibited higher LAVar across sentence types compared to CWS-Rec and CWNS. For the participants who successfully completed the experimental paradigm, the demands of increasing sentence length and syntactic complexity did not appear to disproportionately affect the speech motor coordination of CWS-Per compared to their recovered and fluent peers. However, a subset of CWS-Per failed to produce the required number of accurate utterances. Conclusions These findings support our hypothesis that the speech motor coordination of school-age CWS-Per, on average, is less refined and less mature compared to CWS-Rec and CWNS. Childhood recovery from stuttering is characterized, in part, by overcoming an earlier occurring maturational lag in speech motor development. PMID:28056137
Effects of Visual Speech on Early Auditory Evoked Fields - From the Viewpoint of Individual Variance
Yahata, Izumi; Kanno, Akitake; Hidaka, Hiroshi; Sakamoto, Shuichi; Nakasato, Nobukazu; Kawashima, Ryuta; Katori, Yukio
2017-01-01
The effects of visual speech (the moving image of the speaker’s face uttering speech sound) on early auditory evoked fields (AEFs) were examined using a helmet-shaped magnetoencephalography system in 12 healthy volunteers (9 males, mean age 35.5 years). AEFs (N100m) in response to the monosyllabic sound /be/ were recorded and analyzed under three different visual stimulus conditions, the moving image of the same speaker’s face uttering /be/ (congruent visual stimuli) or uttering /ge/ (incongruent visual stimuli), and visual noise (still image processed from speaker’s face using a strong Gaussian filter: control condition). On average, latency of N100m was significantly shortened in the bilateral hemispheres for both congruent and incongruent auditory/visual (A/V) stimuli, compared to the control A/V condition. However, the degree of N100m shortening was not significantly different between the congruent and incongruent A/V conditions, despite the significant differences in psychophysical responses between these two A/V conditions. Moreover, analysis of the magnitudes of these visual effects on AEFs in individuals showed that the lip-reading effects on AEFs tended to be well correlated between the two different audio-visual conditions (congruent vs. incongruent visual stimuli) in the bilateral hemispheres but were not significantly correlated between right and left hemisphere. On the other hand, no significant correlation was observed between the magnitudes of visual speech effects and psychophysical responses. These results may indicate that the auditory-visual interaction observed on the N100m is a fundamental process which does not depend on the congruency of the visual information. PMID:28141836
The socially weighted encoding of spoken words: a dual-route approach to speech perception.
Sumner, Meghan; Kim, Seung Kyung; King, Ed; McGowan, Kevin B
2013-01-01
Spoken words are highly variable. A single word may never be uttered the same way twice. As listeners, we regularly encounter speakers of different ages, genders, and accents, increasing the amount of variation we face. How listeners understand spoken words as quickly and adeptly as they do despite this variation remains an issue central to linguistic theory. We propose that learned acoustic patterns are mapped simultaneously to linguistic representations and to social representations. In doing so, we illuminate a paradox that results in the literature from, we argue, the focus on representations and the peripheral treatment of word-level phonetic variation. We consider phonetic variation more fully and highlight a growing body of work that is problematic for current theory: words with different pronunciation variants are recognized equally well in immediate processing tasks, while an atypical, infrequent, but socially idealized form is remembered better in the long-term. We suggest that the perception of spoken words is socially weighted, resulting in sparse, but high-resolution clusters of socially idealized episodes that are robust in immediate processing and are more strongly encoded, predicting memory inequality. Our proposal includes a dual-route approach to speech perception in which listeners map acoustic patterns in speech to linguistic and social representations in tandem. This approach makes novel predictions about the extraction of information from the speech signal, and provides a framework with which we can ask new questions. We propose that language comprehension, broadly, results from the integration of both linguistic and social information.
Kharlamov, Viktor; Campbell, Kenneth; Kazanina, Nina
2011-11-01
Speech sounds are not always perceived in accordance with their acoustic-phonetic content. For example, an early and automatic process of perceptual repair, which ensures conformity of speech inputs to the listener's native language phonology, applies to individual input segments that do not exist in the native inventory or to sound sequences that are illicit according to the native phonotactic restrictions on sound co-occurrences. The present study with Russian and Canadian English speakers shows that listeners may perceive phonetically distinct and licit sound sequences as equivalent when the native language system provides robust evidence for mapping multiple phonetic forms onto a single phonological representation. In Russian, due to an optional but productive t-deletion process that affects /stn/ clusters, the surface forms [sn] and [stn] may be phonologically equivalent and map to a single phonological form /stn/. In contrast, [sn] and [stn] clusters are usually phonologically distinct in (Canadian) English. Behavioral data from identification and discrimination tasks indicated that [sn] and [stn] clusters were more confusable for Russian than for English speakers. The EEG experiment employed an oddball paradigm with nonwords [asna] and [astna] used as the standard and deviant stimuli. A reliable mismatch negativity response was elicited approximately 100 msec postchange in the English group but not in the Russian group. These findings point to a perceptual repair mechanism that is engaged automatically at a prelexical level to ensure immediate encoding of speech inputs in phonological terms, which in turn enables efficient access to the meaning of a spoken utterance.
Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval
NASA Astrophysics Data System (ADS)
Zhang, Qingqing; Pan, Jielin; Lin, Yang; Shao, Jian; Yan, Yonghong
In recent decades, there has been a great deal of research into the problem of bilingual speech recognition-to develop a recognizer that can handle inter- and intra-sentential language switching between two languages. This paper presents our recent work on the development of a grammar-constrained, Mandarin-English bilingual Speech Recognition System (MESRS) for real world music retrieval. Two of the main difficult issues in handling the bilingual speech recognition systems for real world applications are tackled in this paper. One is to balance the performance and the complexity of the bilingual speech recognition system; the other is to effectively deal with the matrix language accents in embedded language**. In order to process the intra-sentential language switching and reduce the amount of data required to robustly estimate statistical models, a compact single set of bilingual acoustic models derived by phone set merging and clustering is developed instead of using two separate monolingual models for each language. In our study, a novel Two-pass phone clustering method based on Confusion Matrix (TCM) is presented and compared with the log-likelihood measure method. Experiments testify that TCM can achieve better performance. Since potential system users' native language is Mandarin which is regarded as a matrix language in our application, their pronunciations of English as the embedded language usually contain Mandarin accents. In order to deal with the matrix language accents in embedded language, different non-native adaptation approaches are investigated. Experiments show that model retraining method outperforms the other common adaptation methods such as Maximum A Posteriori (MAP). With the effective incorporation of approaches on phone clustering and non-native adaptation, the Phrase Error Rate (PER) of MESRS for English utterances was reduced by 24.47% relatively compared to the baseline monolingual English system while the PER on Mandarin utterances was comparable to that of the baseline monolingual Mandarin system. The performance for bilingual utterances achieved 22.37% relative PER reduction.
Howard, Ian S.; Messum, Piers
2014-01-01
Words are made up of speech sounds. Almost all accounts of child speech development assume that children learn the pronunciation of first language (L1) speech sounds by imitation, most claiming that the child performs some kind of auditory matching to the elements of ambient speech. However, there is evidence to support an alternative account and we investigate the non-imitative child behavior and well-attested caregiver behavior that this account posits using Elija, a computational model of an infant. Through unsupervised active learning, Elija began by discovering motor patterns, which produced sounds. In separate interaction experiments, native speakers of English, French and German then played the role of his caregiver. In their first interactions with Elija, they were allowed to respond to his sounds if they felt this was natural. We analyzed the interactions through phonemic transcriptions of the caregivers' utterances and found that they interpreted his output within the framework of their native languages. Their form of response was almost always a reformulation of Elija's utterance into well-formed sounds of L1. Elija retained those motor patterns to which a caregiver responded and formed associations between his motor pattern and the response it provoked. Thus in a second phase of interaction, he was able to parse input utterances in terms of the caregiver responses he had heard previously, and respond using his associated motor patterns. This capacity enabled the caregivers to teach Elija to pronounce some simple words in their native languages, by his serial imitation of the words' component speech sounds. Overall, our results demonstrate that the natural responses and behaviors of human subjects to infant-like vocalizations can take a computational model from a biologically plausible initial state through to word pronunciation. This provides support for an alternative to current auditory matching hypotheses for how children learn to pronounce. PMID:25333740
Don't Underestimate the Benefits of Being Misunderstood.
Gibson, Edward; Tan, Caitlin; Futrell, Richard; Mahowald, Kyle; Konieczny, Lars; Hemforth, Barbara; Fedorenko, Evelina
2017-06-01
Being a nonnative speaker of a language poses challenges. Individuals often feel embarrassed by the errors they make when talking in their second language. However, here we report an advantage of being a nonnative speaker: Native speakers give foreign-accented speakers the benefit of the doubt when interpreting their utterances; as a result, apparently implausible utterances are more likely to be interpreted in a plausible way when delivered in a foreign than in a native accent. Across three replicated experiments, we demonstrated that native English speakers are more likely to interpret implausible utterances, such as "the mother gave the candle the daughter," as similar plausible utterances ("the mother gave the candle to the daughter") when the speaker has a foreign accent. This result follows from the general model of language interpretation in a noisy channel, under the hypothesis that listeners assume a higher error rate in foreign-accented than in nonaccented speech.
Jürgens, Rebecca; Fischer, Julia; Schacht, Annekathrin
2018-01-01
Emotional expressions provide strong signals in social interactions and can function as emotion inducers in a perceiver. Although speech provides one of the most important channels for human communication, its physiological correlates, such as activations of the autonomous nervous system (ANS) while listening to spoken utterances, have received far less attention than in other domains of emotion processing. Our study aimed at filling this gap by investigating autonomic activation in response to spoken utterances that were embedded into larger semantic contexts. Emotional salience was manipulated by providing information on alleged speaker similarity. We compared these autonomic responses to activations triggered by affective sounds, such as exploding bombs, and applause. These sounds had been rated and validated as being either positive, negative, or neutral. As physiological markers of ANS activity, we recorded skin conductance responses (SCRs) and changes of pupil size while participants classified both prosodic and sound stimuli according to their hedonic valence. As expected, affective sounds elicited increased arousal in the receiver, as reflected in increased SCR and pupil size. In contrast, SCRs to angry and joyful prosodic expressions did not differ from responses to neutral ones. Pupil size, however, was modulated by affective prosodic utterances, with increased dilations for angry and joyful compared to neutral prosody, although the similarity manipulation had no effect. These results indicate that cues provided by emotional prosody in spoken semantically neutral utterances might be too subtle to trigger SCR, although variation in pupil size indicated the salience of stimulus variation. Our findings further demonstrate a functional dissociation between pupil dilation and skin conductance that presumably origins from their differential innervation. PMID:29541045
The perception of sentence stress in cochlear implant recipients.
Meister, Hartmut; Landwehr, Markus; Pyschny, Verena; Wagner, Petra; Walger, Martin
2011-01-01
Sentence stress is a vital attribute of speech since it indicates the importance of specific words within an utterance. Basic acoustic correlates of stress are syllable duration, intensity, and fundamental frequency (F0). Objectives of the study were to determine cochlear implant (CI) users' perception of the acoustic correlates and to uncover which cues are used for stress identification. Several experiments addressed the discrimination of changes in syllable duration, intensity, and F0 as well as stress identification based on these cues. Moreover, the discrimination of combined cues and identification of stress in conversational speech was examined. Both natural utterances and artificial manipulations of the acoustic cues were used as stimuli. Discrimination of syllable duration did not differ significantly between CI recipients and a control group of normal-hearing listeners. In contrast, CI users performed significantly worse on tasks of discrimination and stress identification based on F0 as well as on intensity. Results from these measurements were significantly correlated with the ability to identify stress in conversational speech. Discrimination performance for covarying F0 and intensity changes was more strongly correlated to identification performance than was found for discrimination of either F0 or intensity alone. Syllable duration was not related to stress identification in natural utterances. The outcome emphasizes the importance of both F0 and intensity for CI users' identification of sentence-based stress. Both cues were used separately for stress perception, but combining the cues provided extra benefit for most of the subjects.
Without his shirt off he saved the child from almost drowning: interpreting an uncertain input
Frazier, Lyn; Clifton, Charles
2014-01-01
Unedited speech and writing often contains errors, e.g., the blending of alternative ways of expressing a message. As a result comprehenders are faced with decisions about what the speaker may have intended, which may not be the same as the grammatically-licensed compositional interpretation of what was said. Two experiments investigated the comprehension of inputs that may have resulted from blending two syntactic forms. The results of the experiments suggest that readers and listeners tend to repair such utterances, restoring them to the presumed intended structure, and they assign the interpretation of the corrected utterance. Utterances that are repaired are expected to also be acceptable when they are easy to diagnose/repair and they are “familiar”, i.e., they correspond to natural speech errors. The results of the experiments established a continuum ranging from outright linguistic illusions with no indication that listeners and readers detected the error (the inclusion of almost in A passerby rescued a child from almost being run over by a bus.), to a majority of unblended interpretations for doubled quantifier sentences (Many students often turn in their assignments late) to only a third undoubled implicit negation (I just like the way the president looks without his shirt off.) The repair or speech error reversal account offered here is contrasted with the noisy channel approach (Gibson et al., 2013) and the good enough processing approach (Ferreiera et al., 2002). PMID:25984551
Neurophysiology underlying influence of stimulus reliability on audiovisual integration.
Shatzer, Hannah; Shen, Stanley; Kerlin, Jess R; Pitt, Mark A; Shahin, Antoine J
2018-01-24
We tested the predictions of the dynamic reweighting model (DRM) of audiovisual (AV) speech integration, which posits that spectrotemporally reliable (informative) AV speech stimuli induce a reweighting of processing from low-level to high-level auditory networks. This reweighting decreases sensitivity to acoustic onsets and in turn increases tolerance to AV onset asynchronies (AVOA). EEG was recorded while subjects watched videos of a speaker uttering trisyllabic nonwords that varied in spectrotemporal reliability and asynchrony of the visual and auditory inputs. Subjects judged the stimuli as in-sync or out-of-sync. Results showed that subjects exhibited greater AVOA tolerance for non-blurred than blurred visual speech and for less than more degraded acoustic speech. Increased AVOA tolerance was reflected in reduced amplitude of the P1-P2 auditory evoked potentials, a neurophysiological indication of reduced sensitivity to acoustic onsets and successful AV integration. There was also sustained visual alpha band (8-14 Hz) suppression (desynchronization) following acoustic speech onsets for non-blurred vs. blurred visual speech, consistent with continuous engagement of the visual system as the speech unfolds. The current findings suggest that increased spectrotemporal reliability of acoustic and visual speech promotes robust AV integration, partly by suppressing sensitivity to acoustic onsets, in support of the DRM's reweighting mechanism. Increased visual signal reliability also sustains the engagement of the visual system with the auditory system to maintain alignment of information across modalities. © 2018 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Kurth, Ruth Justine; Kurth, Lila Mae
A study compared mothers' and fathers' speech patterns when speaking to preschool children, particularly utterance length, sentence types, and word frequencies. All of the children attended a nursery school with a student population of 136 in a large urban area in the Southwest. Volunteer subjects, 28 mothers and 28 fathers of 28 children who…
ERIC Educational Resources Information Center
Quigley, Jean; McNally, Sinéad; Lawson, Sarah
2016-01-01
Research has indicated differences in prosodic expression for infants-at-risk-of-autism spectrum disorders (ASD), and it has been proposed that caregiver speech to these infants may also be moderated prosodically. In typical development, the pitch range of maternal infant-directed speech (IDS) narrows and utterance intensity decreases with infant…
ERIC Educational Resources Information Center
Kubicek, Claudia; de Boisferon, Anne Hillairet; Dupierrix, Eve; Loevenbruck, Helene; Gervain, Judit; Schwarzer, Gudrun
2013-01-01
The present eye-tracking study aimed to investigate the impact of auditory speech information on 12-month-olds' gaze behavior to silently-talking faces. We examined German infants' face-scanning behavior to side-by-side presentation of a bilingual speaker's face silently speaking German utterances on one side and French on the other side, before…
Zäske, Romi; Awwad Shiekh Hasan, Bashar; Belin, Pascal
2017-09-01
Listeners can recognize newly learned voices from previously unheard utterances, suggesting the acquisition of high-level speech-invariant voice representations during learning. Using functional magnetic resonance imaging (fMRI) we investigated the anatomical basis underlying the acquisition of voice representations for unfamiliar speakers independent of speech, and their subsequent recognition among novel voices. Specifically, listeners studied voices of unfamiliar speakers uttering short sentences and subsequently classified studied and novel voices as "old" or "new" in a recognition test. To investigate "pure" voice learning, i.e., independent of sentence meaning, we presented German sentence stimuli to non-German speaking listeners. To disentangle stimulus-invariant and stimulus-dependent learning, during the test phase we contrasted a "same sentence" condition in which listeners heard speakers repeating the sentences from the preceding study phase, with a "different sentence" condition. Voice recognition performance was above chance in both conditions although, as expected, performance was higher for same than for different sentences. During study phases activity in the left inferior frontal gyrus (IFG) was related to subsequent voice recognition performance and same versus different sentence condition, suggesting an involvement of the left IFG in the interactive processing of speaker and speech information during learning. Importantly, at test reduced activation for voices correctly classified as "old" compared to "new" emerged in a network of brain areas including temporal voice areas (TVAs) of the right posterior superior temporal gyrus (pSTG), as well as the right inferior/middle frontal gyrus (IFG/MFG), the right medial frontal gyrus, and the left caudate. This effect of voice novelty did not interact with sentence condition, suggesting a role of temporal voice-selective areas and extra-temporal areas in the explicit recognition of learned voice identity, independent of speech content. Copyright © 2017 Elsevier Ltd. All rights reserved.
Language change in a multiple group society
NASA Astrophysics Data System (ADS)
Pop, Cristina-Maria; Frey, Erwin
2013-08-01
The processes leading to change in languages are manifold. In order to reduce ambiguity in the transmission of information, agreement on a set of conventions for recurring problems is favored. In addition to that, speakers tend to use particular linguistic variants associated with the social groups they identify with. The influence of other groups propagating across the speech community as new variant forms sustains the competition between linguistic variants. With the utterance selection model, an evolutionary description of language change, Baxter [Phys. Rev. EPLEEE81539-375510.1103/PhysRevE.73.046118 73, 046118 (2006)] have provided a mathematical formulation of the interactions inside a group of speakers, exploring the mechanisms that lead to or inhibit the fixation of linguistic variants. In this paper, we take the utterance selection model one step further by describing a speech community consisting of multiple interacting groups. Tuning the interaction strength between groups allows us to gain deeper understanding about the way in which linguistic variants propagate and how their distribution depends on the group partitioning. Both for the group size and the number of groups we find scaling behaviors with two asymptotic regimes. If groups are strongly connected, the dynamics is that of the standard utterance selection model, whereas if their coupling is weak, the magnitude of the latter along with the system size governs the way consensus is reached. Furthermore, we find that a high influence of the interlocutor on a speaker's utterances can act as a counterweight to group segregation.
Rate and rhythm control strategies for apraxia of speech in nonfluent primary progressive aphasia.
Beber, Bárbara Costa; Berbert, Monalise Costa Batista; Grawer, Ruth Siqueira; Cardoso, Maria Cristina de Almeida Freitas
2018-01-01
The nonfluent/agrammatic variant of primary progressive aphasia is characterized by apraxia of speech and agrammatism. Apraxia of speech limits patients' communication due to slow speaking rate, sound substitutions, articulatory groping, false starts and restarts, segmentation of syllables, and increased difficulty with increasing utterance length. Speech and language therapy is known to benefit individuals with apraxia of speech due to stroke, but little is known about its effects in primary progressive aphasia. This is a case report of a 72-year-old, illiterate housewife, who was diagnosed with nonfluent primary progressive aphasia and received speech and language therapy for apraxia of speech. Rate and rhythm control strategies for apraxia of speech were trained to improve initiation of speech. We discuss the importance of these strategies to alleviate apraxia of speech in this condition and the future perspectives in the area.
Brumberg, Jonathan S; Krusienski, Dean J; Chakrabarti, Shreya; Gunduz, Aysegul; Brunner, Peter; Ritaccio, Anthony L; Schalk, Gerwin
2016-01-01
How the human brain plans, executes, and monitors continuous and fluent speech has remained largely elusive. For example, previous research has defined the cortical locations most important for different aspects of speech function, but has not yet yielded a definition of the temporal progression of involvement of those locations as speech progresses either overtly or covertly. In this paper, we uncovered the spatio-temporal evolution of neuronal population-level activity related to continuous overt speech, and identified those locations that shared activity characteristics across overt and covert speech. Specifically, we asked subjects to repeat continuous sentences aloud or silently while we recorded electrical signals directly from the surface of the brain (electrocorticography (ECoG)). We then determined the relationship between cortical activity and speech output across different areas of cortex and at sub-second timescales. The results highlight a spatio-temporal progression of cortical involvement in the continuous speech process that initiates utterances in frontal-motor areas and ends with the monitoring of auditory feedback in superior temporal gyrus. Direct comparison of cortical activity related to overt versus covert conditions revealed a common network of brain regions involved in speech that may implement orthographic and phonological processing. Our results provide one of the first characterizations of the spatiotemporal electrophysiological representations of the continuous speech process, and also highlight the common neural substrate of overt and covert speech. These results thereby contribute to a refined understanding of speech functions in the human brain.
Brumberg, Jonathan S.; Krusienski, Dean J.; Chakrabarti, Shreya; Gunduz, Aysegul; Brunner, Peter; Ritaccio, Anthony L.; Schalk, Gerwin
2016-01-01
How the human brain plans, executes, and monitors continuous and fluent speech has remained largely elusive. For example, previous research has defined the cortical locations most important for different aspects of speech function, but has not yet yielded a definition of the temporal progression of involvement of those locations as speech progresses either overtly or covertly. In this paper, we uncovered the spatio-temporal evolution of neuronal population-level activity related to continuous overt speech, and identified those locations that shared activity characteristics across overt and covert speech. Specifically, we asked subjects to repeat continuous sentences aloud or silently while we recorded electrical signals directly from the surface of the brain (electrocorticography (ECoG)). We then determined the relationship between cortical activity and speech output across different areas of cortex and at sub-second timescales. The results highlight a spatio-temporal progression of cortical involvement in the continuous speech process that initiates utterances in frontal-motor areas and ends with the monitoring of auditory feedback in superior temporal gyrus. Direct comparison of cortical activity related to overt versus covert conditions revealed a common network of brain regions involved in speech that may implement orthographic and phonological processing. Our results provide one of the first characterizations of the spatiotemporal electrophysiological representations of the continuous speech process, and also highlight the common neural substrate of overt and covert speech. These results thereby contribute to a refined understanding of speech functions in the human brain. PMID:27875590
ERIC Educational Resources Information Center
Abedi, Elham
2016-01-01
The development of speech-act theory has provided the hearers with a better understanding of what speakers intend to perform in the act of communication. One type of speech act is apologizing. When an action or utterance has resulted in an offense, the offender needs to apologize. In the present study, an attempt was made to compare the apology…
Girolametto, Luigi; Weitzman, Elaine; Greenberg, Janice
2012-02-01
This study examined the efficacy of a professional development program for early childhood educators that facilitated emergent literacy skills in preschoolers. The program, led by a speech-language pathologist, focused on teaching alphabet knowledge, print concepts, sound awareness, and decontextualized oral language within naturally occurring classroom interactions. Twenty educators were randomly assigned to experimental and control groups. Educators each recruited 3 to 4 children from their classrooms to participate. The experimental group participated in 18 hr of group training and 3 individual coaching sessions with a speech-language pathologist. The effects of intervention were examined in 30 min of videotaped interaction, including storybook reading and a post-story writing activity. At posttest, educators in the experimental group used a higher rate of utterances that included print/sound references and decontextualized language than the control group. Similarly, the children in the experimental group used a significantly higher rate of utterances that included print/sound references and decontextualized language compared to the control group. These findings suggest that professional development provided by a speech-language pathologist can yield short-term changes in the facilitation of emergent literacy skills in early childhood settings. Future research is needed to determine the impact of this program on the children's long-term development of conventional literacy skills.
Expressed parental concern regarding childhood stuttering and the Test of Childhood Stuttering.
Tumanova, Victoria; Choi, Dahye; Conture, Edward G; Walden, Tedra A
The purpose of the present study was to determine whether the Test of Childhood Stuttering observational rating scales (TOCS; Gillam et al., 2009) (1) differed between parents who did versus did not express concern (independent from the TOCS) about their child's speech fluency; (2) correlated with children's frequency of stuttering measured during a child-examiner conversation; and (3) correlated with the length and complexity of children's utterances, as indexed by mean length of utterance (MLU). Participants were 183 young children ages 3:0-5:11. Ninety-one had parents who reported concern about their child's stuttering (65 boys, 26 girls) and 92 had parents who reported no such concern (50 boys, 42 girls). Participants' conversational speech during a child-examiner conversation was analyzed for (a) frequency of occurrence of stuttered and non-stuttered disfluencies, and (b) MLU. Besides expressing concern or lack thereof about their child's speech fluency, parents completed the TOCS observational rating scales documenting how often they observe different disfluency types in speech of their children, as well as disfluency-related consequences. There were three main findings. First, parents who expressed concern (independently from the TOCS) about their child's stuttering reported significantly higher scores on the TOCS Speech Fluency and Disfluency-Related Consequences rating scales. Second, children whose parents rated them higher on the TOCS Speech Fluency rating scale produced more stuttered disfluencies during a child-examiner conversation. Third, children with higher scores on the TOCS Disfluency-Related Consequences rating scale had shorter MLU during child-examiner conversation, across age and level of language ability. Findings support the use of the TOCS observational rating scales as one documentable, objective means to determine parental perception of and concern about their child's stuttering. Findings also support the notion that parents are reasonably accurate, if not reliable, judges of the quantity and quality (i.e., stuttered vs. non-stuttered) of their child's speech disfluencies. Lastly, findings that some children may decrease their verbal output in attempts to minimize instances of stuttering - as indexed by relatively low MLU and a high TOCS Disfluency-Related Consequences scores - provides strong support for sampling young children's speech and language across various situations to obtain the most representative index possible of the child's MLU and associated instances of stuttering. Copyright © 2018 Elsevier Inc. All rights reserved.
Majorano, Marinella; Guidotti, Laura; Guerzoni, Letizia; Murri, Alessandra; Morelli, Marika; Cuda, Domenico; Lavelli, Manuela
2018-01-01
In recent years many studies have shown that the use of cochlear implants (CIs) improves children's skills in processing the auditory signal and, consequently, the development of both language comprehension and production. Nevertheless, many authors have also reported that the development of language skills in children with CIs is variable and influenced by individual factors (e.g., age at CI activation) and contextual aspects (e.g., maternal linguistic input). To assess the characteristics of the spontaneous language production of Italian children with CIs, their mothers' input and the relationship between the two during shared book reading and semi-structured play. Twenty preschool children with CIs and 40 typically developing children, 20 matched for chronological age (CATD group) and 20 matched for hearing age (HATD group), were observed during shared book reading and semi-structured play with their mothers. Samples of spontaneous language were transcribed and analysed for each participant. The numbers of types, tokens, mean length of utterance (MLU) and grammatical categories were considered, and the familiarity of each mother's word was calculated. The children with CIs produced shorter utterances than the children in the CATD group. Their mothers produced language with lower levels of lexical variability and grammatical complexity, and higher proportions of verbs with higher familiarity than did the mothers in the other groups during shared book reading. The children's language was more strongly related to that of their mothers in the CI group than in the other groups, and it was associated with the age at CI activation. The findings suggest that the language of children with CIs is related both to their mothers' input and to age at CI activation. They might prompt suggestions for intervention programs focused on shared-book reading. © 2017 Royal College of Speech and Language Therapists.
Perceptual chunking and its effect on memory in speech processing: ERP and behavioral evidence
Gilbert, Annie C.; Boucher, Victor J.; Jemel, Boutheina
2014-01-01
We examined how perceptual chunks of varying size in utterances can influence immediate memory of heard items (monosyllabic words). Using behavioral measures and event-related potentials (N400) we evaluated the quality of the memory trace for targets taken from perceived temporal groups (TGs) of three and four items. Variations in the amplitude of the N400 showed a better memory trace for items presented in TGs of three compared to those in groups of four. Analyses of behavioral responses along with P300 components also revealed effects of chunk position in the utterance. This is the first study to measure the online effects of perceptual chunks on the memory trace of spoken items. Taken together, the N400 and P300 responses demonstrate that the perceptual chunking of speech facilitates information buffering and a processing on a chunk-by-chunk basis. PMID:24678304
Perceptual chunking and its effect on memory in speech processing: ERP and behavioral evidence.
Gilbert, Annie C; Boucher, Victor J; Jemel, Boutheina
2014-01-01
We examined how perceptual chunks of varying size in utterances can influence immediate memory of heard items (monosyllabic words). Using behavioral measures and event-related potentials (N400) we evaluated the quality of the memory trace for targets taken from perceived temporal groups (TGs) of three and four items. Variations in the amplitude of the N400 showed a better memory trace for items presented in TGs of three compared to those in groups of four. Analyses of behavioral responses along with P300 components also revealed effects of chunk position in the utterance. This is the first study to measure the online effects of perceptual chunks on the memory trace of spoken items. Taken together, the N400 and P300 responses demonstrate that the perceptual chunking of speech facilitates information buffering and a processing on a chunk-by-chunk basis.
Ways of looking ahead: hierarchical planning in language production.
Lee, Eun-Kyung; Brown-Schmidt, Sarah; Watson, Duane G
2013-12-01
It is generally assumed that language production proceeds incrementally, with chunks of linguistic structure planned ahead of speech. Extensive research has examined the scope of language production and suggests that the size of planned chunks varies across contexts (Ferreira & Swets, 2002; Wagner & Jescheniak, 2010). By contrast, relatively little is known about the structure of advance planning, specifically whether planning proceeds incrementally according to the surface structure of the utterance, or whether speakers plan according to the hierarchical relationships between utterance elements. In two experiments, we examine the structure and scope of lexical planning in language production using a picture description task. Analyses of speech onset times and word durations show that speakers engage in hierarchical planning such that structurally dependent lexical items are planned together and that hierarchical planning occurs for both direct and indirect dependencies. Copyright © 2013 Elsevier B.V. All rights reserved.
Expression of Emotion in Eastern and Western Music Mirrors Vocalization
Bowling, Daniel Liu; Sundararajan, Janani; Han, Shui'er; Purves, Dale
2012-01-01
In Western music, the major mode is typically used to convey excited, happy, bright or martial emotions, whereas the minor mode typically conveys subdued, sad or dark emotions. Recent studies indicate that the differences between these modes parallel differences between the prosodic and spectral characteristics of voiced speech sounds uttered in corresponding emotional states. Here we ask whether tonality and emotion are similarly linked in an Eastern musical tradition. The results show that the tonal relationships used to express positive/excited and negative/subdued emotions in classical South Indian music are much the same as those used in Western music. Moreover, tonal variations in the prosody of English and Tamil speech uttered in different emotional states are parallel to the tonal trends in music. These results are consistent with the hypothesis that the association between musical tonality and emotion is based on universal vocal characteristics of different affective states. PMID:22431970
Expression of emotion in Eastern and Western music mirrors vocalization.
Bowling, Daniel Liu; Sundararajan, Janani; Han, Shui'er; Purves, Dale
2012-01-01
In Western music, the major mode is typically used to convey excited, happy, bright or martial emotions, whereas the minor mode typically conveys subdued, sad or dark emotions. Recent studies indicate that the differences between these modes parallel differences between the prosodic and spectral characteristics of voiced speech sounds uttered in corresponding emotional states. Here we ask whether tonality and emotion are similarly linked in an Eastern musical tradition. The results show that the tonal relationships used to express positive/excited and negative/subdued emotions in classical South Indian music are much the same as those used in Western music. Moreover, tonal variations in the prosody of English and Tamil speech uttered in different emotional states are parallel to the tonal trends in music. These results are consistent with the hypothesis that the association between musical tonality and emotion is based on universal vocal characteristics of different affective states.
Low-income fathers' speech to toddlers during book reading versus toy play.
Salo, Virginia C; Rowe, Meredith L; Leech, Kathryn A; Cabrera, Natasha J
2016-11-01
Fathers' child-directed speech across two contexts was examined. Father-child dyads from sixty-nine low-income families were videotaped interacting during book reading and toy play when children were 2;0. Fathers used more diverse vocabulary and asked more questions during book reading while their mean length of utterance was longer during toy play. Variation in these specific characteristics of fathers' speech that differed across contexts was also positively associated with child vocabulary skill measured on the MacArthur-Bates Communicative Development Inventory. Results are discussed in terms of how different contexts elicit specific qualities of child-directed speech that may promote language use and development.
[Ventriloquism and audio-visual integration of voice and face].
Yokosawa, Kazuhiko; Kanaya, Shoko
2012-07-01
Presenting synchronous auditory and visual stimuli in separate locations creates the illusion that the sound originates from the direction of the visual stimulus. Participants' auditory localization bias, called the ventriloquism effect, has revealed factors affecting the perceptual integration of audio-visual stimuli. However, many studies on audio-visual processes have focused on performance in simplified experimental situations, with a single stimulus in each sensory modality. These results cannot necessarily explain our perceptual behavior in natural scenes, where various signals exist within a single sensory modality. In the present study we report the contributions of a cognitive factor, that is, the audio-visual congruency of speech, although this factor has often been underestimated in previous ventriloquism research. Thus, we investigated the contribution of speech congruency on the ventriloquism effect using a spoken utterance and two videos of a talking face. The salience of facial movements was also manipulated. As a result, when bilateral visual stimuli are presented in synchrony with a single voice, cross-modal speech congruency was found to have a significant impact on the ventriloquism effect. This result also indicated that more salient visual utterances attracted participants' auditory localization. The congruent pairing of audio-visual utterances elicited greater localization bias than did incongruent pairing, whereas previous studies have reported little dependency on the reality of stimuli in ventriloquism. Moreover, audio-visual illusory congruency, owing to the McGurk effect, caused substantial visual interference to auditory localization. This suggests that a greater flexibility in responding to multi-sensory environments exists than has been previously considered.
Effect of cognitive load on speech prosody in aviation: Evidence from military simulator flights.
Huttunen, Kerttu; Keränen, Heikki; Väyrynen, Eero; Pääkkönen, Rauno; Leino, Tuomo
2011-01-01
Mental overload directly affects safety in aviation and needs to be alleviated. Speech recordings are obtained non-invasively and as such are feasible for monitoring cognitive load. We recorded speech of 13 military pilots while they were performing a simulator task. Three types of cognitive load (load on situation awareness, information processing and decision making) were rated by a flight instructor separately for each flight phase and participant. As a function of increased cognitive load, the mean utterance-level fundamental frequency (F0) increased, on average, by 7 Hz and the mean vocal intensity increased by 1 dB. In the most intensive simulator flight phases, mean F0 increased by 12 Hz and mean intensity, by 1.5 dB. At the same time, the mean F0 range decreased by 5 Hz, on average. Our results showed that prosodic features of speech can be used to monitor speaker state and support pilot training in a simulator environment. Copyright © 2010 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Prosodic skills in children with Down syndrome and in typically developing children.
Zampini, Laura; Fasolo, Mirco; Spinelli, Maria; Zanchi, Paola; Suttora, Chiara; Salerni, Nicoletta
2016-01-01
Many studies have analysed language development in children with Down syndrome to understand better the nature of their linguistic delays and the reason why these delays, particularly those in the morphosyntactic area, seem greater than their cognitive impairment. However, the prosodic characteristics of language development in children with Down syndrome have been scarcely investigated. To analyse the prosodic skills of children with Down syndrome in the production of multi-word utterances. Data on the prosodic skills of these children were compared with data on typically developing children matched on developmental age and vocabulary size. Between-group differences and the relationships between prosodic and syntactic skills were investigated. The participants were nine children with Down syndrome (who ranged in chronological age from 45 to 63 months and had a mean developmental age of 30 months) and 12 30-month-old typically developing children. The children in both groups had a vocabulary size of approximately 450 words. The children's spontaneous productions were recorded during observations of mother-child play sessions. Data analyses showed that despite their morphosyntactic difficulties, children with Down syndrome were able to master some aspects of prosody in multi-word utterances. They were able to produce single intonation multi-word utterances on the same level as typically developing children. In addition, the intonation contour of their utterances was not negatively influenced by syntactic complexity, contrary to what occurred in typically developing children, although it has to be considered that the utterances produced by children with Down syndrome were less complex than those produced by children in the control group. However, children with Down syndrome appeared to be less able than typically developing children to use intonation to express the pragmatic interrogative function. The findings are discussed considering the effects of social experience on the utterance prosodic realization. © 2015 Royal College of Speech and Language Therapists.
On the Time Course of Vocal Emotion Recognition
Pell, Marc D.; Kotz, Sonja A.
2011-01-01
How quickly do listeners recognize emotions from a speaker's voice, and does the time course for recognition vary by emotion type? To address these questions, we adapted the auditory gating paradigm to estimate how much vocal information is needed for listeners to categorize five basic emotions (anger, disgust, fear, sadness, happiness) and neutral utterances produced by male and female speakers of English. Semantically-anomalous pseudo-utterances (e.g., The rivix jolled the silling) conveying each emotion were divided into seven gate intervals according to the number of syllables that listeners heard from sentence onset. Participants (n = 48) judged the emotional meaning of stimuli presented at each gate duration interval, in a successive, blocked presentation format. Analyses looked at how recognition of each emotion evolves as an utterance unfolds and estimated the “identification point” for each emotion. Results showed that anger, sadness, fear, and neutral expressions are recognized more accurately at short gate intervals than happiness, and particularly disgust; however, as speech unfolds, recognition of happiness improves significantly towards the end of the utterance (and fear is recognized more accurately than other emotions). When the gate associated with the emotion identification point of each stimulus was calculated, data indicated that fear (M = 517 ms), sadness (M = 576 ms), and neutral (M = 510 ms) expressions were identified from shorter acoustic events than the other emotions. These data reveal differences in the underlying time course for conscious recognition of basic emotions from vocal expressions, which should be accounted for in studies of emotional speech processing. PMID:22087275
"Who" is saying "what"? Brain-based decoding of human voice and speech.
Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer
2008-11-07
Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.
Linguistic and pragmatic constraints on utterance interpretation
NASA Astrophysics Data System (ADS)
Hinkelman, Elizabeth A.
1990-05-01
In order to model how people understand language, it is necessary to understand not only grammar and logic but also how people use language to affect their environment. This area of study is known as natural language pragmatics. Speech acts, for instance, are the offers, promises, announcements, etc., that people make by talking. The same expression may be different acts in different contexts, and yet not every expression performs every act. We want to understand how people are able to recognize other's intentions and implications in saying something. Previous plan-based theories of speech act interpretation do not account for the conventional aspect of speech acts. They can, however, be made sensitive to both linguistic and propositional information. This dissertation presents a method of speech act interpretation which uses patterns of linguistic features (e.g., mood, verb form, sentence adverbials, thematic roles) to identify a range of speech act interpretations for the utterance. These are then filtered and elaborated by inferences about agents' goals and plans. In many cases the plan reasoning consists of short, local inference chains (that are in fact conversational implicatures) and, extended reasoning is necessary only for the most difficult cases. The method is able to accommodate a wide range of cases, from those which seem very idiomatic to those which must be analyzed using knowledge about the world and human behavior. It explains how, Can you pass the salt, can be a request while, Are you able to pass the salt, is not.
Drijvers, Linda; Özyürek, Asli; Jensen, Ole
2018-06-19
Previous work revealed that visual semantic information conveyed by gestures can enhance degraded speech comprehension, but the mechanisms underlying these integration processes under adverse listening conditions remain poorly understood. We used MEG to investigate how oscillatory dynamics support speech-gesture integration when integration load is manipulated by auditory (e.g., speech degradation) and visual semantic (e.g., gesture congruency) factors. Participants were presented with videos of an actress uttering an action verb in clear or degraded speech, accompanied by a matching (mixing gesture + "mixing") or mismatching (drinking gesture + "walking") gesture. In clear speech, alpha/beta power was more suppressed in the left inferior frontal gyrus and motor and visual cortices when integration load increased in response to mismatching versus matching gestures. In degraded speech, beta power was less suppressed over posterior STS and medial temporal lobe for mismatching compared with matching gestures, showing that integration load was lowest when speech was degraded and mismatching gestures could not be integrated and disambiguate the degraded signal. Our results thus provide novel insights on how low-frequency oscillatory modulations in different parts of the cortex support the semantic audiovisual integration of gestures in clear and degraded speech: When speech is clear, the left inferior frontal gyrus and motor and visual cortices engage because higher-level semantic information increases semantic integration load. When speech is degraded, posterior STS/middle temporal gyrus and medial temporal lobe are less engaged because integration load is lowest when visual semantic information does not aid lexical retrieval and speech and gestures cannot be integrated.
Ihlefeld, Antje; Litovsky, Ruth Y
2012-01-01
Spatial release from masking refers to a benefit for speech understanding. It occurs when a target talker and a masker talker are spatially separated. In those cases, speech intelligibility for target speech is typically higher than when both talkers are at the same location. In cochlear implant listeners, spatial release from masking is much reduced or absent compared with normal hearing listeners. Perhaps this reduced spatial release occurs because cochlear implant listeners cannot effectively attend to spatial cues. Three experiments examined factors that may interfere with deploying spatial attention to a target talker masked by another talker. To simulate cochlear implant listening, stimuli were vocoded with two unique features. First, we used 50-Hz low-pass filtered speech envelopes and noise carriers, strongly reducing the possibility of temporal pitch cues; second, co-modulation was imposed on target and masker utterances to enhance perceptual fusion between the two sources. Stimuli were presented over headphones. Experiments 1 and 2 presented high-fidelity spatial cues with unprocessed and vocoded speech. Experiment 3 maintained faithful long-term average interaural level differences but presented scrambled interaural time differences with vocoded speech. Results show a robust spatial release from masking in Experiments 1 and 2, and a greatly reduced spatial release in Experiment 3. Faithful long-term average interaural level differences were insufficient for producing spatial release from masking. This suggests that appropriate interaural time differences are necessary for restoring spatial release from masking, at least for a situation where there are few viable alternative segregation cues.
Impairments of speech fluency in Lewy body spectrum disorder.
Ash, Sharon; McMillan, Corey; Gross, Rachel G; Cook, Philip; Gunawardena, Delani; Morgan, Brianna; Boller, Ashley; Siderowf, Andrew; Grossman, Murray
2012-03-01
Few studies have examined connected speech in demented and non-demented patients with Parkinson's disease (PD). We assessed the speech production of 35 patients with Lewy body spectrum disorder (LBSD), including non-demented PD patients, patients with PD dementia (PDD), and patients with dementia with Lewy bodies (DLB), in a semi-structured narrative speech sample in order to characterize impairments of speech fluency and to determine the factors contributing to reduced speech fluency in these patients. Both demented and non-demented PD patients exhibited reduced speech fluency, characterized by reduced overall speech rate and long pauses between sentences. Reduced speech rate in LBSD correlated with measures of between-utterance pauses, executive functioning, and grammatical comprehension. Regression analyses related non-fluent speech, grammatical difficulty, and executive difficulty to atrophy in frontal brain regions. These findings indicate that multiple factors contribute to slowed speech in LBSD, and this is mediated in part by disease in frontal brain regions. Copyright © 2011 Elsevier Inc. All rights reserved.
Children with bilateral cochlear implants identify emotion in speech and music.
Volkova, Anna; Trehub, Sandra E; Schellenberg, E Glenn; Papsin, Blake C; Gordon, Karen A
2013-03-01
This study examined the ability of prelingually deaf children with bilateral implants to identify emotion (i.e. happiness or sadness) in speech and music. Participants in Experiment 1 were 14 prelingually deaf children from 5-7 years of age who had bilateral implants and 18 normally hearing children from 4-6 years of age. They judged whether linguistically neutral utterances produced by a man and woman sounded happy or sad. Participants in Experiment 2 were 14 bilateral implant users from 4-6 years of age and the same normally hearing children as in Experiment 1. They judged whether synthesized piano excerpts sounded happy or sad. Child implant users' accuracy of identifying happiness and sadness in speech was well above chance levels but significantly below the accuracy achieved by children with normal hearing. Similarly, their accuracy of identifying happiness and sadness in music was well above chance levels but significantly below that of children with normal hearing, who performed at ceiling. For the 12 implant users who participated in both experiments, performance on the speech task correlated significantly with performance on the music task and implant experience was correlated with performance on both tasks. Child implant users' accurate identification of emotion in speech exceeded performance in previous studies, which may be attributable to fewer response alternatives and the use of child-directed speech. Moreover, child implant users' successful identification of emotion in music indicates that the relevant cues are accessible at a relatively young age.
The effects of mands and models on the speech of unresponsive language-delayed preschool children.
Warren, S F; McQuarter, R J; Rogers-Warren, A K
1984-02-01
The effects of the systematic use of mands (non-yes/no questions and instructions to verbalize), models (imitative prompts), and specific consequent events on the productive verbal behavior of three unresponsive, socially isolate, language-delayed preschool children were investigated in a multiple-baseline design within a classroom free play period. Following a lengthy intervention condition, experimental procedures were systematically faded out to check for maintenance effects. The treatment resulted in increases in total verbalizations and nonobligatory speech (initiations) by the subjects. Subjects also became more responsive in obligatory speech situations. In a second free play (generalization) setting, increased rates of total child verbalizations and nonobligatory verbalizations were observed for all three subjects, and two of the three subjects were more responsive compared to their baselines in the first free play setting. Rate of total teacher verbalizations and questions were also higher in this setting. Maintenance of the treatment effects was shown during the fading condition in the intervention setting. The subjects' MLUs (mean length of utterance) increased during the intervention condition when the teacher began prompting a minimum of two-word utterances in response to a mand or model.
The effects of Parkinson's disease on the production of contrastive stress
NASA Astrophysics Data System (ADS)
Cheang, Henry S.; Pell, Marc D.
2004-05-01
Reduced speech intelligibility has been observed clinically among patients with Parkinson's disease (PD); one possible contributor to these problems is that motor limitations in PD reduce the ability to mark linguistic contrasts in speech using prosodic cues. This study compared acoustic aspects of the production of contrastive stress (CS) in sentences that were elicited from ten subjects with PD and ten matched control subjects without neurological impairment. Subjects responded to questions that biased them to put emphasis on the first, middle, or last word of target utterances. The mean vowel duration and mean fundamental frequency (F0) of each keyword were then measured, normalized, and analyzed for possible differences in the acoustic cues provided by each group to signal emphatic stress. Both groups demonstrated systematic differences in vowel lengthening between emphasized and unemphasized words across word positions; however, controls were more reliable than PD subjects at modulating the F0 of emphasized words to signal its location in the utterance. Group differences in the F0 measures suggest one possible source of the impoverished intelligibility of Parkinsonian speech and will be investigated in a subsequent study that looks at the direct impact of these changes on emphasis perception by listeners. [Work supported by CIHR.
Are vowel errors influenced by consonantal context in the speech of persons with aphasia?
NASA Astrophysics Data System (ADS)
Gelfer, Carole E.; Bell-Berti, Fredericka; Boyle, Mary
2004-05-01
The literature suggests that vowels and consonants may be affected differently in the speech of persons with conduction aphasia (CA) or nonfluent aphasia with apraxia of speech (AOS). Persons with CA have shown similar error rates across vowels and consonants, while those with AOS have shown more errors for consonants than vowels. These data have been interpreted to suggest that consonants have greater gestural complexity than vowels. However, recent research [M. Boyle et al., Proc. International Cong. Phon. Sci., 3265-3268 (2003)] does not support this interpretation: persons with AOS and CA both had a high proportion of vowel errors, and vowel errors almost always occurred in the context of consonantal errors. To examine the notion that vowels are inherently less complex than consonants and are differentially affected in different types of aphasia, vowel production in different consonantal contexts for speakers with AOS or CA was examined. The target utterances, produced in carrier phrases, were bVC and bV syllables, allowing us to examine whether vowel production is influenced by consonantal context. Listener judgments were obtained for each token, and error productions were grouped according to the intended utterance and error type. Acoustical measurements were made from spectrographic displays.
Age-related changes to the production of linguistic prosody
NASA Astrophysics Data System (ADS)
Barnes, Daniel R.
The production of speech prosody (the rhythm, pausing, and intonation associated with natural speech) is critical to effective communication. The current study investigated the impact of age-related changes to physiology and cognition in relation to the production of two types of linguistic prosody: lexical stress and the disambiguation of syntactically ambiguous utterances. Analyses of the acoustic correlates of stress: speech intensity (or sound-pressure level; SPL), fundamental frequency (F0), key word/phrase duration, and pause duration revealed that both young and older adults effectively use these acoustic features to signal linguistic prosody, although the relative weighting of cues differed by group. Differences in F0 were attributed to age-related physiological changes in the laryngeal subsystem, while group differences in duration measures were attributed to relative task complexity and the cognitive-linguistic load of these respective tasks. The current study provides normative acoustic data for older adults which informs interpretation of clinical findings as well as research pertaining to dysprosody as the result of disease processes.
Low-income fathers’ speech to toddlers during book reading versus toy play*
Salo, Virginia C.; Rowe, Meredith L.; Leech, Kathryn A.; Cabrera, Natasha J.
2016-01-01
Fathers’ child-directed speech across two contexts was examined. Father–child dyads from sixty-nine low-income families were videotaped interacting during book reading and toy play when children were 2;0. Fathers used more diverse vocabulary and asked more questions during book reading while their mean length of utterance was longer during toy play. Variation in these specific characteristics of fathers’ speech that differed across contexts was also positively associated with child vocabulary skill measured on the MacArthur-Bates Communicative Development Inventory. Results are discussed in terms of how different contexts elicit specific qualities of child-directed speech that may promote language use and development. PMID:26541647
Utterance Duration as It Relates to Communicative Variables in Infant Vocal Development
ERIC Educational Resources Information Center
Ramsdell-Hudock, Heather L.; Stuart, Andrew; Parham, Douglas F.
2018-01-01
Purpose: We aimed to provide novel information on utterance duration as it relates to vocal type, facial affect, gaze direction, and age in the prelinguistic/early linguistic infant. Method: Infant utterances were analyzed from longitudinal recordings of 15 infants at 8, 10, 12, 14, and 16 months of age. Utterance durations were measured and coded…
Glennen, Sharon
2014-07-01
The author followed 56 internationally adopted children during the first 3 years after adoption to determine how and when they reached age-expected language proficiency in Standard American English. The influence of age of adoption was measured, along with the relationship between early and later language and speech outcomes. Children adopted from Eastern Europe at ages 12 months to 4 years, 11 months, were assessed 5 times across 3 years. Norm-referenced measures of receptive and expressive language and articulation were compared over time. In addition, mean length of utterance (MLU) was measured. Across all children, receptive language reached age-expected levels more quickly than expressive language. Children adopted at ages 1 and 2 "caught up" more quickly than children adopted at ages 3 and 4. Three years after adoption, there was no difference in test scores across age of adoption groups, and the percentage of children with language or speech delays matched population estimates. MLU was within the average range 3 years after adoption but significantly lower than other language test scores. Three years after adoption, age of adoption did not influence language or speech outcomes, and most children reached age-expected language levels. Expressive syntax as measured by MLU was an area of relative weakness.
Do Listeners Store in Memory a Speaker's Habitual Utterance-Final Phonation Type?
Bőhm, Tamás; Shattuck-Hufnagel, Stefanie
2009-01-01
Earlier studies report systematic differences across speakers in the occurrence of utterance-final irregular phonation; the work reported here investigated whether human listeners remember this speaker-specific information and can access it when necessary (a prerequisite for using this cue in speaker recognition). Listeners personally familiar with the voices of the speakers were presented with pairs of speech samples: one with the original and the other with transformed final phonation type. Asked to select the member of the pair that was closer to the talker's voice, most listeners tended to choose the unmanipulated token (even though they judged them to sound essentially equally natural). This suggests that utterance-final pitch period irregularity is part of the mental representation of individual speaker voices, although this may depend on the individual speaker and listener to some extent. PMID:19776665
Learn Locally, Act Globally: Learning Language from Variation Set Cues
Onnis, Luca; Waterfall, Heidi R.; Edelman, Shimon
2011-01-01
Variation set structure — partial overlap of successive utterances in child-directed speech — has been shown to correlate with progress in children’s acquisition of syntax. We demonstrate the benefits of variation set structure directly: in miniature artificial languages, arranging a certain proportion of utterances in a training corpus in variation sets facilitated word and phrase constituent learning in adults. Our findings have implications for understanding the mechanisms of L1 acquisition by children, and for the development of more efficient algorithms for automatic language acquisition, as well as better methods for L2 instruction. PMID:19019350
Cohen, Alex S; Dinzeo, Thomas J; Donovan, Neila J; Brown, Caitlin E; Morrison, Sean C
2015-03-30
Vocal expression reflects an integral component of communication that varies considerably within individuals across contexts and is disrupted in a range of neurological and psychiatric disorders. There is reason to suspect that variability in vocal expression reflects, in part, the availability of "on-line" resources (e.g., working memory, attention). Thus, understanding vocal expression is a potentially important biometric index of information processing, not only across but within individuals over time. A first step in this line of research involves establishing a link between vocal expression and information processing systems in healthy adults. The present study employed a dual attention experimental task where participants provided natural speech while simultaneously engaged in a baseline, medium or high nonverbal processing-load task. Objective, automated, and computerized analysis was employed to measure vocal expression in 226 adults. Increased processing load resulted in longer pauses, fewer utterances, greater silence overall and less variability in frequency and intensity levels. These results provide compelling evidence of a link between information processing resources and vocal expression, and provide important information for the development of an automated, inexpensive and uninvasive biometric measure of information processing. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Individual differences in children’s private speech: The role of imaginary companions
Davis, Paige E.; Meins, Elizabeth; Fernyhough, Charles
2013-01-01
Relations between children’s imaginary companion status and their engagement in private speech during free play were investigated in a socially diverse sample of 5-year-olds (N = 148). Controlling for socioeconomic status, receptive verbal ability, total number of utterances, and duration of observation, there was a main effect of imaginary companion status on type of private speech. Children who had imaginary companions were more likely to engage in covert private speech compared with their peers who did not have imaginary companions. These results suggest that the private speech of children with imaginary companions is more internalized than that of their peers who do not have imaginary companions and that social engagement with imaginary beings may fulfill a similar role to social engagement with real-life partners in the developmental progression of private speech. PMID:23978382
Spoken Language Processing in the Clarissa Procedure Browser
NASA Technical Reports Server (NTRS)
Rayner, M.; Hockey, B. A.; Renders, J.-M.; Chatzichrisafis, N.; Farrell, K.
2005-01-01
Clarissa, an experimental voice enabled procedure browser that has recently been deployed on the International Space Station, is as far as we know the first spoken dialog system in space. We describe the objectives of the Clarissa project and the system's architecture. In particular, we focus on three key problems: grammar-based speech recognition using the Regulus toolkit; methods for open mic speech recognition; and robust side-effect free dialogue management for handling undos, corrections and confirmations. We first describe the grammar-based recogniser we have build using Regulus, and report experiments where we compare it against a class N-gram recogniser trained off the same 3297 utterance dataset. We obtained a 15% relative improvement in WER and a 37% improvement in semantic error rate. The grammar-based recogniser moreover outperforms the class N-gram version for utterances of all lengths from 1 to 9 words inclusive. The central problem in building an open-mic speech recognition system is being able to distinguish between commands directed at the system, and other material (cross-talk), which should be rejected. Most spoken dialogue systems make the accept/reject decision by applying a threshold to the recognition confidence score. NASA shows how a simple and general method, based on standard approaches to document classification using Support Vector Machines, can give substantially better performance, and report experiments showing a relative reduction in the task-level error rate by about 25% compared to the baseline confidence threshold method. Finally, we describe a general side-effect free dialogue management architecture that we have implemented in Clarissa, which extends the "update semantics'' framework by including task as well as dialogue information in the information state. We show that this enables elegant treatments of several dialogue management problems, including corrections, confirmations, querying of the environment, and regression testing.
Sounds and silence: An optical topography study of language recognition at birth
NASA Astrophysics Data System (ADS)
Peña, Marcela; Maki, Atsushi; Kovaic, Damir; Dehaene-Lambertz, Ghislaine; Koizumi, Hideaki; Bouquet, Furio; Mehler, Jacques
2003-09-01
Does the neonate's brain have left hemisphere (LH) dominance for speech? Twelve full-term neonates participated in an optical topography study designed to assess whether the neonate brain responds specifically to linguistic stimuli. Participants were tested with normal infant-directed speech, with the same utterances played in reverse and without auditory stimulation. We used a 24-channel optical topography device to assess changes in the concentration of total hemoglobin in response to auditory stimulation in 12 areas of the right hemisphere and 12 areas of the LH. We found that LH temporal areas showed significantly more activation when infants were exposed to normal speech than to backward speech or silence. We conclude that neonates are born with an LH superiority to process specific properties of speech.
The Importance of Form in Skinner's Analysis of Verbal Behavior and a Further Step
Vargas, E. A.
2013-01-01
A series of quotes from B. F. Skinner illustrates the importance of form in his analysis of verbal behavior. In that analysis, form plays an important part in contingency control. Form and function complement each other. Function, the array of variables that control a verbal utterance, dictates the meaning of a specified form; form, as stipulated by a verbal community, indicates that meaning. The mediational actions that shape verbal utterances do not necessarily encounter their controlling variables. These are inferred from the form of the verbal utterance. Form carries the burden of implied meaning and underscores the importance of the verbal community in the expression of all the forms of language. Skinner's analysis of verbal behavior and the importance of form within that analysis provides the foundation by which to investigate language. But a further step needs to be undertaken to examine and to explain the abstractions of language as an outcome of action at an aggregate level. PMID:23814376
A multimodal spectral approach to characterize rhythm in natural speech.
Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta
2016-01-01
Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech.
Recognition of speaker-dependent continuous speech with KEAL
NASA Astrophysics Data System (ADS)
Mercier, G.; Bigorgne, D.; Miclet, L.; Le Guennec, L.; Querre, M.
1989-04-01
A description of the speaker-dependent continuous speech recognition system KEAL is given. An unknown utterance, is recognized by means of the followng procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. The combination of feature-based, speaker-independent coarse phonetic segmentation with speaker-dependent statistical classification techniques is one of the main design features of the acoustic-phonetic decoder. The lexical access component is essentially based on a statistical dynamic programming technique which aims at matching a phonemic lexical entry containing various phonological forms, against a phonetic lattice. Sentence recognition is achieved by use of a context-free grammar and a parsing algorithm derived from Earley's parser. A speaker adaptation module allows some of the system parameters to be adjusted by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. Continuously spoken sentences extracted from a 'pseudo-Logo' language are analyzed and results are presented.
Automatic speech recognition research at NASA-Ames Research Center
NASA Technical Reports Server (NTRS)
Coler, Clayton R.; Plummer, Robert P.; Huff, Edward M.; Hitchcock, Myron H.
1977-01-01
A trainable acoustic pattern recognizer manufactured by Scope Electronics is presented. The voice command system VCS encodes speech by sampling 16 bandpass filters with center frequencies in the range from 200 to 5000 Hz. Variations in speaking rate are compensated for by a compression algorithm that subdivides each utterance into eight subintervals in such a way that the amount of spectral change within each subinterval is the same. The recorded filter values within each subinterval are then reduced to a 15-bit representation, giving a 120-bit encoding for each utterance. The VCS incorporates a simple recognition algorithm that utilizes five training samples of each word in a vocabulary of up to 24 words. The recognition rate of approximately 85 percent correct for untrained speakers and 94 percent correct for trained speakers was not considered adequate for flight systems use. Therefore, the built-in recognition algorithm was disabled, and the VCS was modified to transmit 120-bit encodings to an external computer for recognition.
Using Natural Language to Enhance Mission Effectiveness
NASA Technical Reports Server (NTRS)
Trujillo, Anna C.; Meszaros, Erica
2016-01-01
The availability of highly capable, yet relatively cheap, unmanned aerial vehicles (UAVs) is opening up new areas of use for hobbyists and for professional-related activities. The driving function of this research is allowing a non-UAV pilot, an operator, to define and manage a mission. This paper describes the preliminary usability measures of an interface that allows an operator to define the mission using speech to make inputs. An experiment was conducted to begin to enumerate the efficacy and user acceptance of using voice commands to define a multi-UAV mission and to provide high-level vehicle control commands such as "takeoff." The primary independent variable was input type - voice or mouse. The primary dependent variables consisted of the correctness of the mission parameter inputs and the time needed to make all inputs. Other dependent variables included NASA-TLX workload ratings and subjective ratings on a final questionnaire. The experiment required each subject to fill in an online form that contained comparable required information that would be needed for a package dispatcher to deliver packages. For each run, subjects typed in a simple numeric code for the package code. They then defined the initial starting position, the delivery location, and the return location using either pull-down menus or voice input. Voice input was accomplished using CMU Sphinx4-5prealpha for speech recognition. They then inputted the length of the package. These were the option fields. The subject had the system "Calculate Trajectory" and then "Takeoff" once the trajectory was calculated. Later, the subject used "Land" to finish the run. After the voice and mouse input blocked runs, subjects completed a NASA-TLX. At the conclusion of all runs, subjects completed a questionnaire asking them about their experience in inputting the mission parameters, and starting and stopping the mission using mouse and voice input. In general, the usability of voice commands is acceptable. With a relatively well-defined and simple vocabulary, the operator can input the vast majority of the mission parameters using simple, intuitive voice commands. However, voice input may be more applicable to initial mission specification rather than for critical commands such as the need to land immediately due to time and feedback constraints. It would also be convenient to retrieve relevant mission information using voice input. Therefore, further on-going research is looking at using intent from operator utterances to provide the relevant mission information to the operator. The information displayed will be inferred from the operator's utterances just before key phrases are spoken. Linguistic analysis of the context of verbal communication provides insight into the intended meaning of commonly heard phrases such as "What's it doing now?" Analyzing the semantic sphere surrounding these common phrases enables us to predict the operator's intent and supply the operator's desired information to the interface. This paper also describes preliminary investigations into the generation of the semantic space of UAV operation and the success at providing information to the interface based on the operator's utterances.
Provider-patient adherence dialogue in HIV care: results of a multisite study.
Laws, M Barton; Beach, Mary Catherine; Lee, Yoojin; Rogers, William H; Saha, Somnath; Korthuis, P Todd; Sharp, Victoria; Wilson, Ira B
2013-01-01
Few studies have analyzed physician-patient adherence dialogue about ARV treatment in detail. We comprehensively describe physician-patient visits in HIV care, focusing on ARV-related dialogue, using a system that assigns each utterance both a topic code and a speech act code. Observational study using audio recordings of routine outpatient visits by people with HIV at specialty clinics. Providers were 34 physicians and 11 non-M.D. practitioners. Of 415 patients, 66% were male, 59% African-American. 78% reported currently taking ARVs. About 10% of utterances concerned ARV treatment. Among those using ARVs, 15% had any adherence problem solving dialogue. ARV problem solving talk included significantly more directives and control parameter utterances by providers than other topics. Providers were verbally dominant, asked five times as many questions as patients, and made 21 times as many directive utterances. Providers asked few open questions, and rarely checked patients' understanding. Physicians respond to the challenges of caring for patients with HIV by adopting a somewhat physician-centered approach which is particularly evident in discussions about ARV adherence.
Bloch, Steven
2011-01-01
The study described here investigates the practice of anticipatory completion of augmentative and alternative communication (AAC) utterances in progress. The aims were to identify and analyse features of this practice as they occur in natural conversation between a person using an AAC system and a family member. The methods and principles of Conversation Analysis (CA) were used to video record conversations between people with progressive neurological diseases and a progressive speech disorder (dysarthria) and their family members. Key features of interaction were identified and extracts transcribed. Four extracts of talk between a man with motor neurone disease/amyotrophic lateral sclerosis and his mother are presented here. Anticipatory completion of AAC utterances is intimately related to the sequential context in which such utterances occur. Difficulties can arise from topic shifts, understanding the intended action of an AAC word in progress and in recognising the possible end point an utterance. The analysis highlights the importance of understanding how AAC talk works in everyday interaction. The role of co-participants is particularly important here. These results may have implications for both AAC software design and clinical intervention.
2016-03-01
manual rather than verbal responses. The coordinate response measure ( CRM ) task and speech corpus is a highly simplified form of the command and...in multi-talker speech experiments. The CRM corpus is a collection of recorded command utterances in the form of Ready <Callsign> go to <Color...In the two-talker CRM listening task, participants respond to commands by pointing to the appropriate Color/Digit pair on a computer display. A
Drijvers, Linda; Özyürek, Asli; Jensen, Ole
2018-05-01
During face-to-face communication, listeners integrate speech with gestures. The semantic information conveyed by iconic gestures (e.g., a drinking gesture) can aid speech comprehension in adverse listening conditions. In this magnetoencephalography (MEG) study, we investigated the spatiotemporal neural oscillatory activity associated with gestural enhancement of degraded speech comprehension. Participants watched videos of an actress uttering clear or degraded speech, accompanied by a gesture or not and completed a cued-recall task after watching every video. When gestures semantically disambiguated degraded speech comprehension, an alpha and beta power suppression and a gamma power increase revealed engagement and active processing in the hand-area of the motor cortex, the extended language network (LIFG/pSTS/STG/MTG), medial temporal lobe, and occipital regions. These observed low- and high-frequency oscillatory modulations in these areas support general unification, integration and lexical access processes during online language comprehension, and simulation of and increased visual attention to manual gestures over time. All individual oscillatory power modulations associated with gestural enhancement of degraded speech comprehension predicted a listener's correct disambiguation of the degraded verb after watching the videos. Our results thus go beyond the previously proposed role of oscillatory dynamics in unimodal degraded speech comprehension and provide first evidence for the role of low- and high-frequency oscillations in predicting the integration of auditory and visual information at a semantic level. © 2018 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.
Özyürek, Asli; Jensen, Ole
2018-01-01
Abstract During face‐to‐face communication, listeners integrate speech with gestures. The semantic information conveyed by iconic gestures (e.g., a drinking gesture) can aid speech comprehension in adverse listening conditions. In this magnetoencephalography (MEG) study, we investigated the spatiotemporal neural oscillatory activity associated with gestural enhancement of degraded speech comprehension. Participants watched videos of an actress uttering clear or degraded speech, accompanied by a gesture or not and completed a cued‐recall task after watching every video. When gestures semantically disambiguated degraded speech comprehension, an alpha and beta power suppression and a gamma power increase revealed engagement and active processing in the hand‐area of the motor cortex, the extended language network (LIFG/pSTS/STG/MTG), medial temporal lobe, and occipital regions. These observed low‐ and high‐frequency oscillatory modulations in these areas support general unification, integration and lexical access processes during online language comprehension, and simulation of and increased visual attention to manual gestures over time. All individual oscillatory power modulations associated with gestural enhancement of degraded speech comprehension predicted a listener's correct disambiguation of the degraded verb after watching the videos. Our results thus go beyond the previously proposed role of oscillatory dynamics in unimodal degraded speech comprehension and provide first evidence for the role of low‐ and high‐frequency oscillations in predicting the integration of auditory and visual information at a semantic level. PMID:29380945
What does it take to stress a word? Digital manipulation of stress markers in ataxic dysarthria.
Lowit, Anja; Ijitona, Tolulope; Kuschmann, Anja; Corson, Stephen; Soraghan, John
2018-05-18
Stress production is important for effective communication, but this skill is frequently impaired in people with motor speech disorders. The literature reports successful treatment of these deficits in this population, thus highlighting the therapeutic potential of this area. However, no specific guidance is currently available to clinicians about whether any of the stress markers are more effective than others, to what degree they have to be manipulated, and whether strategies need to differ according to the underlying symptoms. In order to provide detailed information on how stress production problems can be addressed, the study investigated (1) the minimum amount of change in a single stress marker necessary to achieve significant improvement in stress target identification; and (2) whether stress can be signalled more effectively with a combination of stress markers. Data were sourced from a sentence stress task performed by 10 speakers with ataxic dysarthria and 10 healthy matched control participants. Fifteen utterances perceived as having incorrect stress patterns (no stress, all words stressed or inappropriate word stressed) were selected and digitally manipulated in a stepwise fashion based on typical speaker performance. Manipulations were performed on F0, intensity and duration, either in isolation or in combination with each other. In addition, pitch contours were modified for some utterances. A total of 50 naïve listeners scored which word they perceived as being stressed. Results showed that increases in duration and intensity at levels smaller than produced by the control participants resulted in significant improvements in listener accuracy. The effectiveness of F0 increases depended on the underlying error pattern. Overall intensity showed the most stable effects. Modifications of the pitch contour also resulted in significant improvements, but not to the same degree as amplification. Integration of two or more stress markers did not result in better results than manipulation of individual stress markers, unless they were combined with pitch contour modifications. The results highlight the potential for improvement of stress production in speakers with motor speech disorders. The fact that individual parameter manipulation is as effective as combining them will facilitate the therapeutic process considerably, as will the result that amplification at lower levels than seen in typical speakers is sufficient. The difference in results across utterance sets highlights the need to investigate the underlying error pattern in order to select the most effective compensatory strategy for clients. © 2018 Royal College of Speech and Language Therapists.
Children with Autism Understand Indirect Speech Acts: Evidence from a Semi-Structured Act-Out Task
Kissine, Mikhail; Cano-Chervel, Julie; Carlier, Sophie; De Brabanter, Philippe; Ducenne, Lesley; Pairon, Marie-Charlotte; Deconinck, Nicolas; Delvenne, Véronique; Leybaert, Jacqueline
2015-01-01
Children with Autism Spectrum Disorder are often said to present a global pragmatic impairment. However, there is some observational evidence that context-based comprehension of indirect requests may be preserved in autism. In order to provide experimental confirmation to this hypothesis, indirect speech act comprehension was tested in a group of 15 children with autism between 7 and 12 years and a group of 20 typically developing children between 2:7 and 3:6 years. The aim of the study was to determine whether children with autism can display genuinely contextual understanding of indirect requests. The experiment consisted of a three-pronged semi-structured task involving Mr Potato Head. In the first phase a declarative sentence was uttered by one adult as an instruction to put a garment on a Mr Potato Head toy; in the second the same sentence was uttered as a comment on a picture by another speaker; in the third phase the same sentence was uttered as a comment on a picture by the first speaker. Children with autism complied with the indirect request in the first phase and demonstrated the capacity to inhibit the directive interpretation in phases 2 and 3. TD children had some difficulty in understanding the indirect instruction in phase 1. These results call for a more nuanced view of pragmatic dysfunction in autism. PMID:26551648
Xiao, Bo; Huang, Chewei; Imel, Zac E; Atkins, David C; Georgiou, Panayiotis; Narayanan, Shrikanth S
2016-04-01
Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy-a key therapy quality index-from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist's language cues. We employed Maximum Entropy models, Maximum Likelihood language models, and a Lattice Rescoring method to characterize high vs. low empathic language. We estimated therapy-session level empathy codes using utterance level evidence obtained from these models. Our experiments showed that the fully automated system achieved a correlation of 0.643 between expert annotated empathy codes and machine-derived estimations, and an accuracy of 81% in classifying high vs. low empathy, in comparison to a 0.721 correlation and 86% accuracy in the oracle setting using manual transcripts. The results show that the system provides useful information that can contribute to automatic quality insurance and therapist training.
Xiao, Bo; Huang, Chewei; Imel, Zac E.; Atkins, David C.; Georgiou, Panayiotis; Narayanan, Shrikanth S.
2016-01-01
Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy—a key therapy quality index—from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist's language cues. We employed Maximum Entropy models, Maximum Likelihood language models, and a Lattice Rescoring method to characterize high vs. low empathic language. We estimated therapy-session level empathy codes using utterance level evidence obtained from these models. Our experiments showed that the fully automated system achieved a correlation of 0.643 between expert annotated empathy codes and machine-derived estimations, and an accuracy of 81% in classifying high vs. low empathy, in comparison to a 0.721 correlation and 86% accuracy in the oracle setting using manual transcripts. The results show that the system provides useful information that can contribute to automatic quality insurance and therapist training. PMID:28286867
Longobardi, Emiddia; Rossi-Arnaud, Clelia; Spataro, Pietro; Putnick, Diane L; Bornstein, Marc H
2015-01-01
Because of its structural characteristics, specifically the prevalence of verb types in infant-directed speech and frequent pronoun-dropping, the Italian language offers an attractive opportunity to investigate the predictive effects of input frequency and positional salience on children's acquisition of nouns and verbs. We examined this issue in a sample of twenty-six mother-child dyads whose spontaneous conversations were recorded, transcribed, and coded at 1;4 and 1;8. The percentages of nouns occurring in the final position of maternal utterances at 1;4 predicted children's production of noun types at 1;8. For verbs, children's growth rates were positively predicted by the percentages of input verbs occurring in utterance-initial position, but negatively predicted by the percentages of verbs located in the final position of maternal utterances at 1;4. These findings clearly illustrate that the effects of positional salience vary across lexical categories.
Perceptual congruency of audio-visual speech affects ventriloquism with bilateral visual stimuli.
Kanaya, Shoko; Yokosawa, Kazuhiko
2011-02-01
Many studies on multisensory processes have focused on performance in simplified experimental situations, with a single stimulus in each sensory modality. However, these results cannot necessarily be applied to explain our perceptual behavior in natural scenes where various signals exist within one sensory modality. We investigated the role of audio-visual syllable congruency on participants' auditory localization bias or the ventriloquism effect using spoken utterances and two videos of a talking face. Salience of facial movements was also manipulated. Results indicated that more salient visual utterances attracted participants' auditory localization. Congruent pairing of audio-visual utterances elicited greater localization bias than incongruent pairing, while previous studies have reported little dependency on the reality of stimuli in ventriloquism. Moreover, audio-visual illusory congruency, owing to the McGurk effect, caused substantial visual interference on auditory localization. Multisensory performance appears more flexible and adaptive in this complex environment than in previous studies.
Listeners' Comprehension of Uptalk in Spontaneous Speech
ERIC Educational Resources Information Center
Tomlinson, John M., Jr.; Tree, Jean E. Fox
2011-01-01
Listeners' comprehension of phrase final rising pitch on declarative utterances, or "uptalk", was examined to test the hypothesis that prolongations might differentiate conflicting functions of rising pitch. In Experiment 1 we found that listeners rated prolongations as indicating more speaker uncertainty, but that rising pitch was unrelated to…
Prelinguistic Pitch Patterns Expressing "Communication" and "Apprehension"
ERIC Educational Resources Information Center
Papaeliou, Christina F.; Trevarthen, Colwyn
2006-01-01
This study examined whether pitch patterns of prelinguistic vocalizations could discriminate between social vocalizations, uttered apparently with the intention to communicate, and "private" speech, related to solitary activities as an expression of "thinking". Four healthy ten month old English-speaking infants (2 boys and 2 girls) were…
Investigating Joint Attention Mechanisms through Spoken Human-Robot Interaction
ERIC Educational Resources Information Center
Staudte, Maria; Crocker, Matthew W.
2011-01-01
Referential gaze during situated language production and comprehension is tightly coupled with the unfolding speech stream (Griffin, 2001; Meyer, Sleiderink, & Levelt, 1998; Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995). In a shared environment, utterance comprehension may further be facilitated when the listener can exploit the speaker's…
Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor
NASA Astrophysics Data System (ADS)
Heracleous, Panikos; Kaino, Tomomi; Saruwatari, Hiroshi; Shikano, Kiyohiro
2006-12-01
We present the use of stethoscope and silicon NAM (nonaudible murmur) microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible) speech, but also very quietly uttered speech (nonaudible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc.) for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a[InlineEquation not available: see fulltext.] word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.
Knowledge and implicature: modeling language understanding as social cognition.
Goodman, Noah D; Stuhlmüller, Andreas
2013-01-01
Is language understanding a special case of social cognition? To help evaluate this view, we can formalize it as the rational speech-act theory: Listeners assume that speakers choose their utterances approximately optimally, and listeners interpret an utterance by using Bayesian inference to "invert" this model of the speaker. We apply this framework to model scalar implicature ("some" implies "not all," and "N" implies "not more than N"). This model predicts an interaction between the speaker's knowledge state and the listener's interpretation. We test these predictions in two experiments and find good fit between model predictions and human judgments. Copyright © 2013 Cognitive Science Society, Inc.
NASA Astrophysics Data System (ADS)
Carlsen, William S.
This article describes the effects of science teacher subject-matter knowledge on classroom discourse at the level of individual utterances. It details one of three parallel analyses conducted in a year-long study of language in the classrooms of four new biology teachers. The conceptual framework of the study predicts that when teaching unfamiliar subject matter, teachers use a variety of discourse strategies to constrain student talk to a narrowly circumscribed topic domain. This article includes the results of an utterance-by-utterance analysis of teacher and student talk in a 30-lesson sample of science instruction. Data are broken down by classroom activity (e.g., lecture, laboratory, group work) for several measures, including mean duration of utterances, domination of the speaking floor by the teacher, frequency of teacher questioning, cognitive level of teacher questions, and student verbal participation. When teaching unfamiliar topics, the four teachers in this study tended to talk more often and for longer periods of time, ask questions frequently, and rely heavily on low cognitive level questions. The rate of student questions to the teacher varied with classroom activity. In common classroom communicative settings, student questions were less common when the teacher was teaching unfamiliar subject matter. The implications of these findings include a suggestion that teacher knowledge may be an important unconsidered variable in research on the cognitive level of questions and teacher wait-time.
Is Language a Factor in the Perception of Foreign Accent Syndrome?
Jose, Linda; Read, Jennifer; Miller, Nick
2016-06-01
Neurogenic foreign accent syndrome (FAS) is diagnosed when listeners perceive speech associated with motor speech impairments as foreign rather than disordered. Speakers with foreign accent syndrome typically have aphasia. It remains unclear how far language changes might contribute to the perception of foreign accent syndrome independent of accent. Judges with and without training in language analysis rated orthographic transcriptions of speech from people with foreign accent syndrome, speech-language disorder and no foreign accent syndrome, foreign accent without neurological impairment and healthy controls on scales of foreignness, normalness and disorderedness. Control speakers were judged as significantly more normal, less disordered and less foreign than other groups. Foreign accent syndrome speakers' transcriptions consistently profiled most closely to those of foreign speakers and significantly different to speakers with speech-language disorder. On normalness and foreignness ratings there were no significant differences between foreign and foreign accent syndrome speakers. For disorderedness, foreign accent syndrome participants fell midway between foreign speakers and those with speech-language impairment only. Slower rate, more hesitations, pauses within and between utterances influenced judgments, delineating control scripts from others. Word-level syntactic and morphological deviations and reduced syntactic and semantic repertoire linked strongly with foreignness perceptions. Greater disordered ratings related to word fragments, poorly intelligible grammatical structures and inappropriate word selection. Language changes influence foreignness perception. Clinical and theoretical issues are addressed.
Vowels in infant-directed speech: More breathy and more variable, but not clearer.
Miyazawa, Kouki; Shinya, Takahito; Martin, Andrew; Kikuchi, Hideaki; Mazuka, Reiko
2017-09-01
Infant-directed speech (IDS) is known to differ from adult-directed speech (ADS) in a number of ways, and it has often been argued that some of these IDS properties facilitate infants' acquisition of language. An influential study in support of this view is Kuhl et al. (1997), which found that vowels in IDS are produced with expanded first and second formants (F1/F2) on average, indicating that the vowels are acoustically further apart in IDS than in ADS. These results have been interpreted to mean that the way vowels are produced in IDS makes infants' task of learning vowel categories easier. The present paper revisits this interpretation by means of a thorough analysis of IDS vowels using a large-scale corpus of Japanese natural utterances. We will show that the expansion of F1/F2 values does occur in spontaneous IDS even when the vowels' prosodic position, lexical pitch accent, and lexical bias are accounted for. When IDS vowels are compared to carefully read speech (CS) by the same mothers, however, larger variability among IDS vowel tokens means that the acoustic distances among vowels are farther apart only in CS, but not in IDS when compared to ADS. Finally, we will show that IDS vowels are significantly more breathy than ADS or CS vowels. Taken together, our results demonstrate that even though expansion of formant values occurs in spontaneous IDS, this expansion cannot be interpreted as an indication that the acoustic distances among vowels are farther apart, as is the case in CS. Instead, we found that IDS vowels are characterized by breathy voice, which has been associated with the communication of emotional affect. Copyright © 2017 Elsevier B.V. All rights reserved.
Cepstral domain modification of audio signals for data embedding: preliminary results
NASA Astrophysics Data System (ADS)
Gopalan, Kaliappan
2004-06-01
A method of embedding data in an audio signal using cepstral domain modification is described. Based on successful embedding in the spectral points of perceptually masked regions in each frame of speech, first the technique was extended to embedding in the log spectral domain. This extension resulted at approximately 62 bits /s of embedding with less than 2 percent of bit error rate (BER) for a clean cover speech (from the TIMIT database), and about 2.5 percent for a noisy speech (from an air traffic controller database), when all frames - including silence and transition between voiced and unvoiced segments - were used. Bit error rate increased significantly when the log spectrum in the vicinity of a formant was modified. In the next procedure, embedding by altering the mean cepstral values of two ranges of indices was studied. Tests on both a noisy utterance and a clean utterance indicated barely noticeable perceptual change in speech quality when lower range of cepstral indices - corresponding to vocal tract region - was modified in accordance with data. With an embedding capacity of approximately 62 bits/s - using one bit per each frame regardless of frame energy or type of speech - initial results showed a BER of less than 1.5 percent for a payload capacity of 208 embedded bits using the clean cover speech. BER of less than 1.3 percent resulted for the noisy host with a capacity was 316 bits. When the cepstrum was modified in the region of excitation, BER increased to over 10 percent. With quantization causing no significant problem, the technique warrants further studies with different cepstral ranges and sizes. Pitch-synchronous cepstrum modification, for example, may be more robust to attacks. In addition, cepstrum modification in regions of speech that are perceptually masked - analogous to embedding in frequency masked regions - may yield imperceptible stego audio with low BER.
The Crucial Role of Tense for Verb Production
ERIC Educational Resources Information Center
Druks, J.; Carroll, E.
2005-01-01
The case of an aphasic patient whose spontaneous speech contains very few lexical verbs is reported. Instead of sentences with lexical verbs, the patient produces many (grammatical) copular constructions. He also substitutes lexical verbs with the copula. Although this results in ungrammatical utterances, by doing so, a resemblance of sentence…
Automatic Intention Recognition in Conversation Processing
ERIC Educational Resources Information Center
Holtgraves, Thomas
2008-01-01
A fundamental assumption of many theories of conversation is that comprehension of a speaker's utterance involves recognition of the speaker's intention in producing that remark. However, the nature of intention recognition is not clear. One approach is to conceptualize a speaker's intention in terms of speech acts [Searle, J. (1969). "Speech…
A Closer Look at Formulaic Language: Prosodic Characteristics of Swedish Proverbs
ERIC Educational Resources Information Center
Hallin, Anna Eva; Van Lancker Sidtis, Diana
2017-01-01
Formulaic expressions (such as idioms, proverbs, and conversational speech formulas) are currently a topic of interest. Examination of prosody in formulaic utterances, a less explored property of formulaic expressions, has yielded controversial views. The present study investigates prosodic characteristics of proverbs, as one type of formulaic…
Mothers' Speech Addressed to One-, Two-, and Three-Year-Old Normal Children
ERIC Educational Resources Information Center
Longhurst, Thomas M.; Stepanich, Lyanne
1975-01-01
Analysis of mother-child interaction data for 36 children and their mothers revealed that the three groups of mothers' verbal interactions differed significantly in their mean length of utterance, percentage of yes-no questions, percentage of information questions, and percentage of clarification questions. (Author/CS)
Cost-sensitive learning for emotion robust speaker recognition.
Li, Dongdong; Yang, Yingchun; Dai, Weihui
2014-01-01
In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.
Cost-Sensitive Learning for Emotion Robust Speaker Recognition
Li, Dongdong; Yang, Yingchun
2014-01-01
In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. PMID:24999492
Secure Recognition of Voice-Less Commands Using Videos
NASA Astrophysics Data System (ADS)
Yau, Wai Chee; Kumar, Dinesh Kant; Weghorn, Hans
Interest in voice recognition technologies for internet applications is growing due to the flexibility of speech-based communication. The major drawback with the use of sound for internet access with computers is that the commands will be audible to other people in the vicinity. This paper examines a secure and voice-less method for recognition of speech-based commands using video without evaluating sound signals. The proposed approach represents mouth movements in the video data using 2D spatio-temporal templates (STT). Zernike moments (ZM) are computed from STT and fed into support vector machines (SVM) to be classified into one of the utterances. The experimental results demonstrate that the proposed technique produces a high accuracy of 98% in a phoneme classification task. The proposed technique is demonstrated to be invariant to global variations of illumination level. Such a system is useful for securely interpreting user commands for internet applications on mobile devices.
NASA Astrophysics Data System (ADS)
Work, Richard; Andruski, Jean; Casielles, Eugenia; Kim, Sahyang; Nathan, Geoff
2005-04-01
Traditionally, English is classified as a stress-timed language while Spanish is classified as syllable-timed. Examining the contrasting development of rhythmic patterns in bilingual first language acquisition should provide information on how this differentiation takes place. As part of a longitudinal study, speech samples were taken of a Spanish/English bilingual child of Argentinean parents living in the Midwestern United States between the ages of 1;8 and 3;2. Spanish is spoken at home and English input comes primarily from an English day care the child attends 5 days a week. The parents act as interlocutors for Spanish recordings with a native speaker interacting with the child for the English recordings. Following the work of Grabe, Post and Watson (1999) and Grabe and Low (2002) a normalized Pairwise Variability Index (PVI) is used which compares, in utterances of minimally four syllables, the durations of vocalic intervals in successive syllables. Comparisons are then made between the rhythmic patterns of the child's productions within each language over time and between languages at comparable MLUs. Comparisons are also made with the rhythmic patterns of the adult productions of each language. Results will be analyzed for signs of native speaker-like rhythmic production in the child.
Shahin, Antoine J; Shen, Stanley; Kerlin, Jess R
2017-01-01
We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker's mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker's mouth movements and the speech sounds were in-sync or out-of-sync . Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.
Direct magnitude estimates of speech intelligibility in dysarthria: effects of a chosen standard.
Weismer, Gary; Laures, Jacqueline S
2002-06-01
Direct magnitude estimation (DME) has been used frequently as a perceptual scaling technique in studies of the speech intelligibility of persons with speech disorders. The technique is typically used with a standard, or reference stimulus, chosen as a good exemplar of "midrange" intelligibility. In several published studies, the standard has been chosen subjectively, usually on the basis of the expertise of the investigators. The current experiment demonstrates that a fixed set of sentence-level utterances, obtained from 4 individuals with dysarthria (2 with Parkinson disease, 2 with traumatic brain injury) as well as 3 neurologically normal speakers, is scaled differently depending on the identity of the standard. Four different standards were used in the main experiment, three of which were judged qualitatively in two independent evaluations to be good exemplars of midrange intelligibility. Acoustic analyses did not reveal obvious differences between these four standards but suggested that the standard with the worst-scaled intelligibility had much poorer voice source characteristics compared to the other three standards. Results are discussed in terms of possible standardization of midrange intelligibility exemplars for DME experiments.
End-to-End ASR-Free Keyword Search From Speech
NASA Astrophysics Data System (ADS)
Audhkhasi, Kartik; Rosenberg, Andrew; Sethy, Abhinav; Ramabhadran, Bhuvana; Kingsbury, Brian
2017-12-01
End-to-end (E2E) systems have achieved competitive results compared to conventional hybrid hidden Markov model (HMM)-deep neural network based automatic speech recognition (ASR) systems. Such E2E systems are attractive due to the lack of dependence on alignments between input acoustic and output grapheme or HMM state sequence during training. This paper explores the design of an ASR-free end-to-end system for text query-based keyword search (KWS) from speech trained with minimal supervision. Our E2E KWS system consists of three sub-systems. The first sub-system is a recurrent neural network (RNN)-based acoustic auto-encoder trained to reconstruct the audio through a finite-dimensional representation. The second sub-system is a character-level RNN language model using embeddings learned from a convolutional neural network. Since the acoustic and text query embeddings occupy different representation spaces, they are input to a third feed-forward neural network that predicts whether the query occurs in the acoustic utterance or not. This E2E ASR-free KWS system performs respectably despite lacking a conventional ASR system and trains much faster.
Hollien, Harry; Huntley Bahr, Ruth; Harnsberger, James D
2014-03-01
The following article provides a general review of an area that can be referred to as Forensic Voice. Its goals will be outlined and that discussion will be followed by a description of its major elements. Considered are (1) the processing and analysis of spoken utterances, (2) distorted speech, (3) enhancement of speech intelligibility (re: surveillance and other recordings), (4) transcripts, (5) authentication of recordings, (6) speaker identification, and (7) the detection of deception, intoxication, and emotions in speech. Stress in speech and the psychological stress evaluation systems (that some individuals attempt to use as lie detectors) also will be considered. Points of entry will be suggested for individuals with the kinds of backgrounds possessed by professionals already working in the voice area. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Evaluating deep learning architectures for Speech Emotion Recognition.
Fayek, Haytham M; Lech, Margaret; Cavedon, Lawrence
2017-08-01
Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models' performances. Copyright © 2017 Elsevier Ltd. All rights reserved.
Unspoken vowel recognition using facial electromyogram.
Arjunan, Sridhar P; Kumar, Dinesh K; Yau, Wai C; Weghorn, Hans
2006-01-01
The paper aims to identify speech using the facial muscle activity without the audio signals. The paper presents an effective technique that measures the relative muscle activity of the articulatory muscles. Five English vowels were used as recognition variables. This paper reports using moving root mean square (RMS) of surface electromyogram (SEMG) of four facial muscles to segment the signal and identify the start and end of the utterance. The RMS of the signal between the start and end markers was integrated and normalised. This represented the relative muscle activity of the four muscles. These were classified using back propagation neural network to identify the speech. The technique was successfully used to classify 5 vowels into three classes and was not sensitive to the variation in speed and the style of speaking of the different subjects. The results also show that this technique was suitable for classifying the 5 vowels into 5 classes when trained for each of the subjects. It is suggested that such a technology may be used for the user to give simple unvoiced commands when trained for the specific user.
Development of speech prostheses: current status and recent advances
Brumberg, Jonathan S; Guenther, Frank H
2010-01-01
Brain–computer interfaces (BCIs) have been developed over the past decade to restore communication to persons with severe paralysis. In the most severe cases of paralysis, known as locked-in syndrome, patients retain cognition and sensation, but are capable of only slight voluntary eye movements. For these patients, no standard communication method is available, although some can use BCIs to communicate by selecting letters or words on a computer. Recent research has sought to improve on existing techniques by using BCIs to create a direct prediction of speech utterances rather than to simply control a spelling device. Such methods are the first steps towards speech prostheses as they are intended to entirely replace the vocal apparatus of paralyzed users. This article outlines many well known methods for restoration of communication by BCI and illustrates the difference between spelling devices and direct speech prediction or speech prosthesis. PMID:20822389
NASA Astrophysics Data System (ADS)
Liberman, A. M.
1980-06-01
This report (1 April - 30 June) is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts cover the following topics: The perceptual equivalance of two acoustic cues for a speech contrast is specific to phonetic perception; Duplex perception of acoustic patterns as speech and nonspeech; Evidence for phonetic processing of cues to place of articulation: Perceived manner affects perceived place; Some articulatory correlates of perceptual isochrony; Effects of utterance continuity on phonetic judgments; Laryngeal adjustments in stuttering: A glottographic observation using a modified reaction paradigm; Missing -ing in reading: Letter detection errors on word endings; Speaking rate; syllable stress, and vowel identity; Sonority and syllabicity: Acoustic correlates of perception, Influence of vocalic context on perception of the (S)-(s) distinction.
Quantifying repetitive speech in autism spectrum disorders and language impairment.
van Santen, Jan P H; Sproat, Richard W; Hill, Alison Presmanes
2013-10-01
We report on an automatic technique for quantifying two types of repetitive speech: repetitions of what the child says him/herself (self-repeats) and of what is uttered by an interlocutor (echolalia). We apply this technique to a sample of 111 children between the ages of four and eight: 42 typically developing children (TD), 19 children with specific language impairment (SLI), 25 children with autism spectrum disorders (ASD) plus language impairment (ALI), and 25 children with ASD with normal, non-impaired language (ALN). The results indicate robust differences in echolalia between the TD and ASD groups as a whole (ALN + ALI), and between TD and ALN children. There were no significant differences between ALI and SLI children for echolalia or self-repetitions. The results confirm previous findings that children with ASD repeat the language of others more than other populations of children. On the other hand, self-repetition does not appear to be significantly more frequent in ASD, nor does it matter whether the child's echolalia occurred within one (immediate) or two turns (near-immediate) of the adult's original utterance. Furthermore, non-significant differences between ALN and SLI, between TD and SLI, and between ALI and TD are suggestive that echolalia may not be specific to ALN or to ASD in general. One important innovation of this work is an objective fully automatic technique for assessing the amount of repetition in a transcript of a child's utterances. © 2013 International Society for Autism Research, Wiley Periodicals, Inc.
Quantifying Repetitive Speech in Autism Spectrum Disorders and Language Impairment
van Santen, Jan P. H.; Sproat, Richard W.; Hill, Alison Presmanes
2013-01-01
We report on an automatic technique for quantifying two types of repetitive speech: repetitions of what the child says him/herself (self-repeats) and of what is uttered by an interlocutor (echolalia). We apply this technique to a sample of 111 children between the ages of four and eight: 42 typically developing children (TD), 19 children with specific language impairment (SLI), 25 children with autism spectrum disorders (ASD) plus language impairment (ALI), and 25 children with ASD with normal, non-impaired language (ALN). The results indicate robust differences in echolalia between the TD and ASD groups as a whole (ALN + ALI), and between TD and ALN children. There were no significant differences between ALI and SLI children for echolalia or self-repetitions. The results confirm previous findings that children with ASD repeat the language of others more than other populations of children. On the other hand, self-repetition does not appear to be significantly more frequent in ASD, nor does it matter whether the child’s echolalia occurred within one (immediate) or two turns (near-immediate) of the adult’s original utterance. Furthermore, non-significant differences between ALN and SLI, between TD and SLI, and between ALI and TD are suggestive that echolalia may not be specific to ALN or to ASD in general. One important innovation of this work is an objective fully automatic technique for assessing the amount of repetition in a transcript of a child’s utterances. PMID:23661504
Speech Motor Development during Acquisition of the Voicing Contrast
ERIC Educational Resources Information Center
Grigos, Maria I.; Saxman, John H.; Gordon, Andrew M.
2005-01-01
Lip and jaw movements were studied longitudinally in 19-month-old children as they acquired the voicing contrast for /p/ and /b/. A movement tracking system obtained lip and jaw kinematics as participants produced the target utterances /papa/ and /baba/. Laryngeal adjustments were also tracked through acoustically recorded voice onset time (VOT)…
The Effects of Surgical Rapid Maxillary Expansion (SRME) on Vowel Formants
ERIC Educational Resources Information Center
Sari, Emel; Kilic, Mehmet Akif
2009-01-01
The objective of this study was to investigate the effect of surgical rapid maxillary expansion (SRME) on vowel production. The subjects included 12 patients, whose speech were considered perceptually normal, that had undergone surgical RME for expansion of a narrow maxilla. They uttered the following Turkish vowels, ([a], [[epsilon
Phonological Systems of Speech-Disordered Clients with Positive/Negative Histories of Otitis Media.
ERIC Educational Resources Information Center
Churchill, Janine D.; And Others
1988-01-01
Evaluation of object-naming utterances of articulation-disordered children (ages 3-6) found that subjects with histories of recurrent otitis media during their first 24 months evidenced stridency deletion (in consonant singletons and in consonant clusters) significantly more than did subjects with negative otitis media histories. (Author/DB)
Spatiotemporal Dynamics of Speech Sound Perception in Chronic Developmental Stuttering
ERIC Educational Resources Information Center
Liotti, Mario; Ingham, Janis C.; Takai, Osamu; Paskos, Delia Kothmann; Perez, Ricardo; Ingham, Roger J.
2010-01-01
High-density ERPs were recorded in eight adults with persistent developmental stuttering (PERS) and eight matched normally fluent (CONT) control volunteers while participants either repeatedly uttered the vowel "ah" or listened to their own previously recorded vocalizations. The fronto-central N1 auditory wave was reduced in response to spoken…
Whole Word Measures in Bilingual Children with Speech Sound Disorders
ERIC Educational Resources Information Center
Burrows, Lauren; Goldstein, Brian A.
2010-01-01
Phonological acquisition traditionally has been measured using constructs that focus on segments rather than the whole words. Findings from recent research have suggested whole-word productions be evaluated using measures such as phonological mean length of utterance (pMLU) and the proportion of whole-word proximity (PWP). These measures have been…
Listening for Identity beyond the Speech Event
ERIC Educational Resources Information Center
Wortham, Stanton
2010-01-01
Background: A typical account of listening focuses on cognition, describing how a listener understands and reacts to the cognitive contents of a speaker's utterance. The articles in this issue move beyond a cognitive view, arguing that listening also involves moral, aesthetic, and political aspects. Focus of Study: This article attends to all four…
ERIC Educational Resources Information Center
Sohail, Juwairia; Johnson, Elizabeth K.
2016-01-01
Much of what we know about the development of listeners' word segmentation strategies originates from the artificial language-learning literature. However, many artificial speech streams designed to study word segmentation lack a salient cue found in all natural languages: utterance boundaries. In this study, participants listened to a…
Reducing Vocalized Pauses in Public Speaking Situations Using the VP Card
ERIC Educational Resources Information Center
Ramos Salazar, Leslie
2014-01-01
This article describes a speaking problem very common in today's world--"vocalized pauses" (VP). Vocalized pauses are defined as utterances such as "uh," "like," and "um" that occur between words in oral sentences. This practice of everyday speech can affect how a speaker's intentions are…
Making Non-Fluent Aphasics Speak: Sing along!
ERIC Educational Resources Information Center
Racette, Amelie; Bard, Celine; Peretz, Isabelle
2006-01-01
A classic observation in neurology is that aphasics can sing words they cannot pronounce otherwise. To further assess this claim, we investigated the production of sung and spoken utterances in eight brain-damaged patients suffering from a variety of speech disorders as a consequence of a left-hemisphere lesion. In Experiment 1, the patients were…
Integrating Linguistic, Motor, and Perceptual Information in Language Production
ERIC Educational Resources Information Center
Frank, Austin F.
2011-01-01
Speakers show remarkable adaptability in updating and correcting their utterances in response to changes in the environment. When an interlocutor raises an eyebrow or the AC kicks on and introduces ambient noise, it seems that speakers are able to quickly integrate this information into their speech plans and adapt appropriately. This ability to…
Segment-based acoustic models for continuous speech recognition
NASA Astrophysics Data System (ADS)
Ostendorf, Mari; Rohlicek, J. R.
1993-07-01
This research aims to develop new and more accurate stochastic models for speaker-independent continuous speech recognition, by extending previous work in segment-based modeling and by introducing a new hierarchical approach to representing intra-utterance statistical dependencies. These techniques, which are more costly than traditional approaches because of the large search space associated with higher order models, are made feasible through rescoring a set of HMM-generated N-best sentence hypotheses. We expect these different modeling techniques to result in improved recognition performance over that achieved by current systems, which handle only frame-based observations and assume that these observations are independent given an underlying state sequence. In the fourth quarter of the project, we have completed the following: (1) ported our recognition system to the Wall Street Journal task, a standard task in the ARPA community; (2) developed an initial dependency-tree model of intra-utterance observation correlation; and (3) implemented baseline language model estimation software. Our initial results on the Wall Street Journal task are quite good and represent significantly improved performance over most HMM systems reporting on the Nov. 1992 5k vocabulary test set.
Clarissa Spoken Dialogue System for Procedure Reading and Navigation
NASA Technical Reports Server (NTRS)
Hieronymus, James; Dowding, John
2004-01-01
Speech is the most natural modality for humans use to communicate with other people, agents and complex systems. A spoken dialogue system must be robust to noise and able to mimic human conversational behavior, like correcting misunderstandings, answering simple questions about the task and understanding most well formed inquiries or commands. The system aims to understand the meaning of the human utterance, and if it does not, then it discards the utterance as being meant for someone else. The first operational system is Clarissa, a conversational procedure reader and navigator, which will be used in a System Development Test Objective (SDTO) on the International Space Station (ISS) during Expedition 10. In the present environment one astronaut reads the procedure on a Manual Procedure Viewer (MPV) or paper, and has to stop to read or turn pages, shifting focus from the task. Clarissa is designed to read and navigate ISS procedures entirely with speech, while the astronaut has his eyes and hands engaged in performing the task. The system also provides an MPV like graphical interface so the procedure can be read visually. A demo of the system will be given.
Effect of cognitive load on articulation rate and formant frequencies during simulator flights.
Huttunen, Kerttu H; Keränen, Heikki I; Pääkkönen, Rauno J; Päivikki Eskelinen-Rönkä, R; Leino, Tuomo K
2011-03-01
It was explored how three types of intensive cognitive load typical of military aviation (load on situation awareness, information processing, or decision-making) affect speech. The utterances of 13 male military pilots were recorded during simulated combat flights. Articulation rate was calculated from the speech samples, and the first formant (F1) and second formant (F2) were tracked from first-syllable short vowels in pre-defined phoneme environments. Articulation rate was found to correlate negatively (albeit with low coefficients) with loads on situation awareness and decision-making but not with changes in F1 or F2. Changes were seen in the spectrum of the vowels: mean F1 of front vowels usually increased and their mean F2 decreased as a function of cognitive load, and both F1 and F2 of back vowels increased. The strongest associations were seen between the three types of cognitive load and F1 and F2 changes in back vowels. Because fluent and clear radio speech communication is vital to safety in aviation and temporal and spectral changes may affect speech intelligibility, careful use of standard aviation phraseology and training in the production of clear speech during a high level of cognitive load are important measures that diminish the probability of possible misunderstandings. © 2011 Acoustical Society of America
Importance of the brow in facial expressiveness during human communication.
Neely, John Gail; Lisker, Paul; Drapekin, Jesse
2014-03-01
The objective of this study was to evaluate laterality and upper/lower face dominance of expressiveness during prescribed speech using a unique validated image subtraction system capable of sensitive and reliable measurement of facial surface deformation. Observations and experiments of central control of facial expressions during speech and social utterances in humans and animals suggest that the right mouth moves more than the left during nonemotional speech. However, proficient lip readers seem to attend to the whole face to interpret meaning from expressed facial cues, also implicating a horizontal (upper face-lower face) axis. Prospective experimental design. Experimental maneuver: recited speech. image-subtraction strength-duration curve amplitude. Thirty normal human adults were evaluated during memorized nonemotional recitation of 2 short sentences. Facial movements were assessed using a video-image subtractions system capable of simultaneously measuring upper and lower specific areas of each hemiface. The results demonstrate both axes influence facial expressiveness in human communication; however, the horizontal axis (upper versus lower face) would appear dominant, especially during what would appear to be spontaneous breakthrough unplanned expressiveness. These data are congruent with the concept that the left cerebral hemisphere has control over nonemotionally stimulated speech; however, the multisynaptic brainstem extrapyramidal pathways may override hemiface laterality and preferentially take control of the upper face. Additionally, these data demonstrate the importance of the often-ignored brow in facial expressiveness. Experimental study. EBM levels not applicable.
Linear Classifier with Reject Option for the Detection of Vocal Fold Paralysis and Vocal Fold Edema
NASA Astrophysics Data System (ADS)
Kotropoulos, Constantine; Arce, Gonzalo R.
2009-12-01
Two distinct two-class pattern recognition problems are studied, namely, the detection of male subjects who are diagnosed with vocal fold paralysis against male subjects who are diagnosed as normal and the detection of female subjects who are suffering from vocal fold edema against female subjects who do not suffer from any voice pathology. To do so, utterances of the sustained vowel "ah" are employed from the Massachusetts Eye and Ear Infirmary database of disordered speech. Linear prediction coefficients extracted from the aforementioned utterances are used as features. The receiver operating characteristic curve of the linear classifier, that stems from the Bayes classifier when Gaussian class conditional probability density functions with equal covariance matrices are assumed, is derived. The optimal operating point of the linear classifier is specified with and without reject option. First results using utterances of the "rainbow passage" are also reported for completeness. The reject option is shown to yield statistically significant improvements in the accuracy of detecting the voice pathologies under study.
Sobol-Shikler, Tal; Robinson, Peter
2010-07-01
We present a classification algorithm for inferring affective states (emotions, mental states, attitudes, and the like) from their nonverbal expressions in speech. It is based on the observations that affective states can occur simultaneously and different sets of vocal features, such as intonation and speech rate, distinguish between nonverbal expressions of different affective states. The input to the inference system was a large set of vocal features and metrics that were extracted from each utterance. The classification algorithm conducted independent pairwise comparisons between nine affective-state groups. The classifier used various subsets of metrics of the vocal features and various classification algorithms for different pairs of affective-state groups. Average classification accuracy of the 36 pairwise machines was 75 percent, using 10-fold cross validation. The comparison results were consolidated into a single ranked list of the nine affective-state groups. This list was the output of the system and represented the inferred combination of co-occurring affective states for the analyzed utterance. The inference accuracy of the combined machine was 83 percent. The system automatically characterized over 500 affective state concepts from the Mind Reading database. The inference of co-occurring affective states was validated by comparing the inferred combinations to the lexical definitions of the labels of the analyzed sentences. The distinguishing capabilities of the system were comparable to human performance.
Wildgruber, D; Hertrich, I; Riecker, A; Erb, M; Anders, S; Grodd, W; Ackermann, H
2004-12-01
In addition to the propositional content of verbal utterances, significant linguistic and emotional information is conveyed by the tone of speech. To differentiate brain regions subserving processing of linguistic and affective aspects of intonation, discrimination of sentences differing in linguistic accentuation and emotional expressiveness was evaluated by functional magnetic resonance imaging. Both tasks yielded rightward lateralization of hemodynamic responses at the level of the dorsolateral frontal cortex as well as bilateral thalamic and temporal activation. Processing of linguistic and affective intonation, thus, seems to be supported by overlapping neural networks comprising partially right-sided brain regions. Comparison of hemodynamic activation during the two different tasks, however, revealed bilateral orbito-frontal responses restricted to the affective condition as opposed to activation of the left lateral inferior frontal gyrus confined to evaluation of linguistic intonation. These findings indicate that distinct frontal regions contribute to higher level processing of intonational information depending on its communicational function. In line with other components of language processing, discrimination of linguistic accentuation seems to be lateralized to the left inferior-lateral frontal region whereas bilateral orbito-frontal areas subserve evaluation of emotional expressiveness.
Freud, Plato and Irigaray: A Morpho-Logic of Teaching and Learning
ERIC Educational Resources Information Center
Peers, Chris
2012-01-01
This article discusses two well-known texts that respectively describe learning and teaching, drawn from the work of Freud and Plato. These texts are considered in psychoanalytic terms using a methodology drawn from the philosophy of Luce Irigaray. In particular the article addresses Irigaray's approach to the analysis of speech and utterance as a…
Phonological Patterns Observed in Young Children with Cleft Palate.
ERIC Educational Resources Information Center
Broen, Patricia A.; And Others
The study examined the speech production strategies used by 4 young children (30- to 32-months-old) with cleft palate and velopharyngeal inadequacy during the early stages of phonological learning. All the children had had primary palatal surgery and were producing primarily single word utterances with a few 2- and 3-word phrases. Analysis of each…
ERIC Educational Resources Information Center
Ganz, Jennifer B.; Simpson, Richard L.
2004-01-01
Few studies on augmentative and alternative communication (AAC) systems have addressed the potential for such systems to impact word utterances in children with autism spectrum disorders (ASD). The Picture Exchange Communication System (PECS) is an AAC system designed specifically to minimize difficulties with communication skills experienced by…
"Et Pis Bon, Ben Alors Voilà Quoi!" Teaching Those Pesky Discourse Markers
ERIC Educational Resources Information Center
Mullan, Kerry
2016-01-01
Discourse markers have been described as "nervous tics, fillers, or signs of hesitation", and are frequently dismissed as features of lazy or inarticulate speech. Yet in fact they have a number of crucial functions in spoken interaction, such as buying time, managing turn taking, linking utterances, introducing a new topic and indicating…
Interaction of Language Processing and Motor Skill in Children with Specific Language Impairment
ERIC Educational Resources Information Center
DiDonato Brumbach, Andrea C.; Goffman, Lisa
2014-01-01
Purpose: To examine how language production interacts with speech motor and gross and fine motor skill in children with specific language impairment (SLI). Method: Eleven children with SLI and 12 age-matched peers (4-6 years) produced structurally primed sentences containing particles and prepositions. Utterances were analyzed for errors and for…
Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging.
Hagedorn, Christina; Proctor, Michael; Goldstein, Louis; Wilson, Stephen M; Miller, Bruce; Gorno-Tempini, Maria Luisa; Narayanan, Shrikanth S
2017-04-14
Real-time magnetic resonance imaging (MRI) and accompanying analytical methods are shown to capture and quantify salient aspects of apraxic speech, substantiating and expanding upon evidence provided by clinical observation and acoustic and kinematic data. Analysis of apraxic speech errors within a dynamic systems framework is provided and the nature of pathomechanisms of apraxic speech discussed. One adult male speaker with apraxia of speech was imaged using real-time MRI while producing spontaneous speech, repeated naming tasks, and self-paced repetition of word pairs designed to elicit speech errors. Articulatory data were analyzed, and speech errors were detected using time series reflecting articulatory activity in regions of interest. Real-time MRI captured two types of apraxic gestural intrusion errors in a word pair repetition task. Gestural intrusion errors in nonrepetitive speech, multiple silent initiation gestures at the onset of speech, and covert (unphonated) articulation of entire monosyllabic words were also captured. Real-time MRI and accompanying analytical methods capture and quantify many features of apraxic speech that have been previously observed using other modalities while offering high spatial resolution. This patient's apraxia of speech affected the ability to select only the appropriate vocal tract gestures for a target utterance, suppressing others, and to coordinate them in time.
Automatic detection of Parkinson's disease in running speech spoken in three different languages.
Orozco-Arroyave, J R; Hönig, F; Arias-Londoño, J D; Vargas-Bonilla, J F; Daqrouq, K; Skodda, S; Rusz, J; Nöth, E
2016-01-01
The aim of this study is the analysis of continuous speech signals of people with Parkinson's disease (PD) considering recordings in different languages (Spanish, German, and Czech). A method for the characterization of the speech signals, based on the automatic segmentation of utterances into voiced and unvoiced frames, is addressed here. The energy content of the unvoiced sounds is modeled using 12 Mel-frequency cepstral coefficients and 25 bands scaled according to the Bark scale. Four speech tasks comprising isolated words, rapid repetition of the syllables /pa/-/ta/-/ka/, sentences, and read texts are evaluated. The method proves to be more accurate than classical approaches in the automatic classification of speech of people with PD and healthy controls. The accuracies range from 85% to 99% depending on the language and the speech task. Cross-language experiments are also performed confirming the robustness and generalization capability of the method, with accuracies ranging from 60% to 99%. This work comprises a step forward for the development of computer aided tools for the automatic assessment of dysarthric speech signals in multiple languages.
The impact of phonetic dissimilarity on the perception of foreign accented speech
NASA Astrophysics Data System (ADS)
Weil, Shawn A.
2003-10-01
Non-normative speech (i.e., synthetic speech, pathological speech, foreign accented speech) is more difficult to process for native listeners than is normative speech. Does perceptual dissimilarity affect only intelligibility, or are there other costs to processing? The current series of experiments investigates both the intelligibility and time course of foreign accented speech (FAS) perception. Native English listeners heard single English words spoken by both native English speakers and non-native speakers (Mandarin or Russian). Words were chosen based on the similarity between the phonetic inventories of the respective languages. Three experimental designs were used: a cross-modal matching task, a word repetition (shadowing) task, and two subjective ratings tasks which measured impressions of accentedness and effortfulness. The results replicate previous investigations that have found that FAS significantly lowers word intelligibility. Furthermore, in FAS as well as perceptual effort, in the word repetition task, correct responses are slower to accented words than to nonaccented words. An analysis indicates that both intelligibility and reaction time are, in part, functions of the similarity between the talker's utterance and the listener's representation of the word.
Hantke, Simone; Weninger, Felix; Kurle, Richard; Ringeval, Fabien; Batliner, Anton; Mousa, Amr El-Desoky; Schuller, Björn
2016-01-01
We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient. PMID:27176486
On how the brain decodes vocal cues about speaker confidence.
Jiang, Xiaoming; Pell, Marc D
2015-05-01
In speech communication, listeners must accurately decode vocal cues that refer to the speaker's mental state, such as their confidence or 'feeling of knowing'. However, the time course and neural mechanisms associated with online inferences about speaker confidence are unclear. Here, we used event-related potentials (ERPs) to examine the temporal neural dynamics underlying a listener's ability to infer speaker confidence from vocal cues during speech processing. We recorded listeners' real-time brain responses while they evaluated statements wherein the speaker's tone of voice conveyed one of three levels of confidence (confident, close-to-confident, unconfident) or were spoken in a neutral manner. Neural responses time-locked to event onset show that the perceived level of speaker confidence could be differentiated at distinct time points during speech processing: unconfident expressions elicited a weaker P2 than all other expressions of confidence (or neutral-intending utterances), whereas close-to-confident expressions elicited a reduced negative response in the 330-500 msec and 550-740 msec time window. Neutral-intending expressions, which were also perceived as relatively confident, elicited a more delayed, larger sustained positivity than all other expressions in the 980-1270 msec window for this task. These findings provide the first piece of evidence of how quickly the brain responds to vocal cues signifying the extent of a speaker's confidence during online speech comprehension; first, a rough dissociation between unconfident and confident voices occurs as early as 200 msec after speech onset. At a later stage, further differentiation of the exact level of speaker confidence (i.e., close-to-confident, very confident) is evaluated via an inferential system to determine the speaker's meaning under current task settings. These findings extend three-stage models of how vocal emotion cues are processed in speech comprehension (e.g., Schirmer & Kotz, 2006) by revealing how a speaker's mental state (i.e., feeling of knowing) is simultaneously inferred from vocal expressions. Copyright © 2015 Elsevier Ltd. All rights reserved.
Parent-Child Interaction Therapy (PCIT) in school-aged children with specific language impairment.
Allen, Jessica; Marshall, Chloë R
2011-01-01
Parents play a critical role in their child's language development. Therefore, advising parents of a child with language difficulties on how to facilitate their child's language might benefit the child. Parent-Child Interaction Therapy (PCIT) has been developed specifically for this purpose. In PCIT, the speech-and-language therapist (SLT) works collaboratively with parents, altering interaction styles to make interaction more appropriate to their child's level of communicative needs. This study investigates the effectiveness of PCIT in 8-10-year-old children with specific language impairment (SLI) in the expressive domain. It aimed to identify whether PCIT had any significant impact on the following communication parameters of the child: verbal initiations, verbal and non-verbal responses, mean length of utterance (MLU), and proportion of child-to-parent utterances. Sixteen children with SLI and their parents were randomly assigned to two groups: treated or delayed treatment (control). The treated group took part in PCIT over a 4-week block, and then returned to the clinic for a final session after a 6-week consolidation period with no input from the therapist. The treated and control group were assessed in terms of the different communication parameters at three time points: pre-therapy, post-therapy (after the 4-week block) and at the final session (after the consolidation period), through video analysis. It was hypothesized that all communication parameters would significantly increase in the treated group over time and that no significant differences would be found in the control group. All the children in the treated group made language gains during spontaneous interactions with their parents. In comparison with the control group, PCIT had a positive effect on three of the five communication parameters: verbal initiations, MLU and the proportion of child-to-parent utterances. There was a marginal effect on verbal responses, and a trend towards such an effect for non-verbal responses. Despite the small group sizes, this study provides preliminary evidence that PCIT can achieve its treatment goals with 8-10-year-olds who have expressive language impairments. This has potentially important implications for how mainstream speech and language services provide intervention to school-aged children. In contrast to direct one-to-one therapy, PCIT offers a single block of therapy where the parents' communication and interaction skills are developed to provide the child with an appropriate language-rich environment, which in turn could be more cost-effective for the service provider. © 2010 Royal College of Speech & Language Therapists.
Zhou, Peiyun; Christianson, Kiel
2016-01-01
Auditory perceptual simulation (APS) during silent reading refers to situations in which the reader actively simulates the voice of a character or other person depicted in a text. In three eye-tracking experiments, APS effects were investigated as people read utterances attributed to a native English speaker, a non-native English speaker, or no speaker at all. APS effects were measured via online eye movements and offline comprehension probes. Results demonstrated that inducing APS during silent reading resulted in observable differences in reading speed when readers simulated the speech of faster compared to slower speakers and compared to silent reading without APS. Social attitude survey results indicated that readers' attitudes towards the native and non-native speech did not consistently influence APS-related effects. APS of both native speech and non-native speech increased reading speed, facilitated deeper, less good-enough sentence processing, and improved comprehension compared to normal silent reading.
Improving Understanding of Emotional Speech Acoustic Content
NASA Astrophysics Data System (ADS)
Tinnemore, Anna
Children with cochlear implants show deficits in identifying emotional intent of utterances without facial or body language cues. A known limitation to cochlear implants is the inability to accurately portray the fundamental frequency contour of speech which carries the majority of information needed to identify emotional intent. Without reliable access to the fundamental frequency, other methods of identifying vocal emotion, if identifiable, could be used to guide therapies for training children with cochlear implants to better identify vocal emotion. The current study analyzed recordings of adults speaking neutral sentences with a set array of emotions in a child-directed and adult-directed manner. The goal was to identify acoustic cues that contribute to emotion identification that may be enhanced in child-directed speech, but are also present in adult-directed speech. Results of this study showed that there were significant differences in the variation of the fundamental frequency, the variation of intensity, and the rate of speech among emotions and between intended audiences.
NASA Astrophysics Data System (ADS)
Toscano, Joseph Christopher
Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech processing, listeners encode continuous differences in acoustic cues, independent of phonological categories; (2) at post-perceptual stages, fine-grained acoustic information is preserved; and (3) there is preliminary evidence that listeners encode cues relative to context via feedback from categories. These results are discussed in relation to proposed models of speech perception and sources of contextual variability.
Speech disruptions in relation to language growth in children who stutter: an exploratory study.
Wagovich, Stacy A; Hall, Nancy E; Clifford, Betsy A
2009-12-01
Young children with typical fluency demonstrate a range of disfluencies, or speech disruptions. One type of disruption, revision, appears to increase in frequency as syntactic skills develop. To date, this phenomenon has not been studied in children who stutter (CWS). Rispoli, Hadley, and Holt (2008) suggest a schema for categorizing speech disruptions in terms of revisions and stalls. The purpose of this exploratory study was to use this schema to evaluate whether CWS show a pattern over time in their production of stuttering, revisions, and stalls. Nine CWS, ages 2;1 to 4;11, participated in the study, producing language samples each month for 10 months. MLU and vocd analyses were performed for samples across three time periods. Active declarative sentences within these samples were examined for the presence of disruptions. Results indicated that the proportion of sentences containing revisions increased over time, but proportions for stalls and stuttering did not. Visual inspection revealed that more stuttering and stalls occurred on longer utterances than on shorter utterances. Upon examination of individual children's language, it appears two-thirds of the children showed a pattern in which, as MLU increased, revisions increased as well. Findings are similar to studies of children with typical fluency, suggesting that, despite the fact that CWS display more (and different) disfluencies relative to typically fluent peers, revisions appear to increase over time and correspond to increases in MLU, just as is the case with peers. The reader will be able to: (1) describe the three types of speech disruptions assessed in this article; (2) compare present findings of disruptions in children who stutter to findings of previous research with children who are typically fluent; and (3) discuss future directions in this area of research, given the findings and implications of this study.
Variation in dual-task performance reveals late initiation of speech planning in turn-taking.
Sjerps, Matthias J; Meyer, Antje S
2015-03-01
The smooth transitions between turns in natural conversation suggest that speakers often begin to plan their utterances while listening to their interlocutor. The presented study investigates whether this is indeed the case and, if so, when utterance planning begins. Two hypotheses were contrasted: that speakers begin to plan their turn as soon as possible (in our experiments less than a second after the onset of the interlocutor's turn), or that they do so close to the end of the interlocutor's turn. Turn-taking was combined with a finger tapping task to measure variations in cognitive load. We assumed that the onset of speech planning in addition to listening would be accompanied by deterioration in tapping performance. Two picture description experiments were conducted. In both experiments there were three conditions: (1) Tapping and Speaking, where participants tapped a complex pattern while taking over turns from a pre-recorded speaker, (2) Tapping and Listening, where participants carried out the tapping task while overhearing two pre-recorded speakers, and (3) Speaking Only, where participants took over turns as in the Tapping and Speaking condition but without tapping. The experiments differed in the amount of tapping training the participants received at the beginning of the session. In Experiment 2, the participants' eye-movements were recorded in addition to their speech and tapping. Analyses of the participants' tapping performance and eye movements showed that they initiated the cognitively demanding aspects of speech planning only shortly before the end of the turn of the preceding speaker. We argue that this is a smart planning strategy, which may be the speakers' default in many everyday situations. Copyright © 2014 Elsevier B.V. All rights reserved.
Language choice in bimodal bilingual development.
Lillo-Martin, Diane; de Quadros, Ronice M; Chen Pichler, Deborah; Fieldsteel, Zoe
2014-01-01
Bilingual children develop sensitivity to the language used by their interlocutors at an early age, reflected in differential use of each language by the child depending on their interlocutor. Factors such as discourse context and relative language dominance in the community may mediate the degree of language differentiation in preschool age children. Bimodal bilingual children, acquiring both a sign language and a spoken language, have an even more complex situation. Their Deaf parents vary considerably in access to the spoken language. Furthermore, in addition to code-mixing and code-switching, they use code-blending-expressions in both speech and sign simultaneously-an option uniquely available to bimodal bilinguals. Code-blending is analogous to code-switching sociolinguistically, but is also a way to communicate without suppressing one language. For adult bimodal bilinguals, complete suppression of the non-selected language is cognitively demanding. We expect that bimodal bilingual children also find suppression difficult, and use blending rather than suppression in some contexts. We also expect relative community language dominance to be a factor in children's language choices. This study analyzes longitudinal spontaneous production data from four bimodal bilingual children and their Deaf and hearing interlocutors. Even at the earliest observations, the children produced more signed utterances with Deaf interlocutors and more speech with hearing interlocutors. However, while three of the four children produced >75% speech alone in speech target sessions, they produced <25% sign alone in sign target sessions. All four produced bimodal utterances in both, but more frequently in the sign sessions, potentially because they find suppression of the dominant language more difficult. Our results indicate that these children are sensitive to the language used by their interlocutors, while showing considerable influence from the dominant community language.
Language choice in bimodal bilingual development
Lillo-Martin, Diane; de Quadros, Ronice M.; Chen Pichler, Deborah; Fieldsteel, Zoe
2014-01-01
Bilingual children develop sensitivity to the language used by their interlocutors at an early age, reflected in differential use of each language by the child depending on their interlocutor. Factors such as discourse context and relative language dominance in the community may mediate the degree of language differentiation in preschool age children. Bimodal bilingual children, acquiring both a sign language and a spoken language, have an even more complex situation. Their Deaf parents vary considerably in access to the spoken language. Furthermore, in addition to code-mixing and code-switching, they use code-blending—expressions in both speech and sign simultaneously—an option uniquely available to bimodal bilinguals. Code-blending is analogous to code-switching sociolinguistically, but is also a way to communicate without suppressing one language. For adult bimodal bilinguals, complete suppression of the non-selected language is cognitively demanding. We expect that bimodal bilingual children also find suppression difficult, and use blending rather than suppression in some contexts. We also expect relative community language dominance to be a factor in children's language choices. This study analyzes longitudinal spontaneous production data from four bimodal bilingual children and their Deaf and hearing interlocutors. Even at the earliest observations, the children produced more signed utterances with Deaf interlocutors and more speech with hearing interlocutors. However, while three of the four children produced >75% speech alone in speech target sessions, they produced <25% sign alone in sign target sessions. All four produced bimodal utterances in both, but more frequently in the sign sessions, potentially because they find suppression of the dominant language more difficult. Our results indicate that these children are sensitive to the language used by their interlocutors, while showing considerable influence from the dominant community language. PMID:25368591
Segmenting words from natural speech: subsegmental variation in segmental cues.
Rytting, C Anton; Brew, Chris; Fosler-Lussier, Eric
2010-06-01
Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We use this new representation to re-evaluate a key computational model of word segmentation. One finding is that high levels of phonetic variability degrade the model's performance. While robustness to phonetic variability may be intrinsically valuable, this finding needs to be complemented by parallel studies of the actual abilities of children to segment phonetically variable speech.
Using English and French Proverbs as Comparative Pairs to Teach the Terminal Junctures
ERIC Educational Resources Information Center
Yurtbasi, Metin
2013-01-01
Junctures are pauses used in speech separating thought-groups from one another in order to give the listener time to digest the utterance to signal the end. Where junctures are present, hearers find it easier to understand what is said as they are able to discern the individual words between such verbal breaks. Junctures being universal…
Domaneschi, Filippo; Passarelli, Marcello; Chiorri, Carlo
2017-08-01
Language scientists have broadly addressed the problem of explaining how language users recognize the kind of speech act performed by a speaker uttering a sentence in a particular context. They have done so by investigating the role played by the illocutionary force indicating devices (IFIDs), i.e., all linguistic elements that indicate the illocutionary force of an utterance. The present work takes a first step in the direction of an experimental investigation of non-verbal IFIDs because it investigates the role played by facial expressions and, in particular, of upper-face action units (AUs) in the comprehension of three basic types of illocutionary force: assertions, questions, and orders. The results from a pilot experiment on production and two comprehension experiments showed that (1) certain upper-face AUs seem to constitute non-verbal signals that contribute to the understanding of the illocutionary force of questions and orders; (2) assertions are not expected to be marked by any upper-face AU; (3) some upper-face AUs can be associated, with different degrees of compatibility, with both questions and orders.
Deanol in Gilles de la Tourette Syndrome: a preliminary investigation.
Pinta, E R
1977-03-01
On the basis of its pharmacologic action Deanol (dimethyl aminoethanol) was hypothesized to be of benefit in the Gilles de la Tourette Syndrome. In one case report the addition of Deanol to perphenazine did not result in an improvement of uncontrollable movements or involuntary speech utterances. Gilles de la Tourette Syndrome is a condition combining organic and psychogenic features existing in the interface between two etiologies. Classically the disease begins in childhood and is characterized by the appearance of sudden involuntary movements, involuntary speech utterances frequently consisting of curse words (coprolalia), and imitative phenomena such as echolalia and echopraxia. Neurotic symptomatology such as anxiety and obsessive thinking have also been reported. This condition is regarded neuropharmacologically as a dopaminergic state that responds to drugs with antidopaminergic activity e.g. the phenothiazines and butyrophenones. Deanol (dimethyl aminoethanol) is a putative cholinergic agonist and has reported effectiveness in conditions where there is a predominance of dopaminergic versus cholinergic activity, e.g. levodopa-induced dyskinesias, neuroleptic induced tardive dyskinesia, and Huntington's chorea. Because of its effectiveness in dopaminergic states it was hypothesized that Deanol could also be of benefit in the Gilles de la Tourette Syndrome.
Effects of interior aircraft noise on speech intelligibility and annoyance
NASA Technical Reports Server (NTRS)
Pearsons, K. S.; Bennett, R. L.
1977-01-01
Recordings of the aircraft ambiance from ten different types of aircraft were used in conjunction with four distinct speech interference tests as stimuli to determine the effects of interior aircraft background levels and speech intelligibility on perceived annoyance in 36 subjects. Both speech intelligibility and background level significantly affected judged annoyance. However, the interaction between the two variables showed that above an 85 db background level the speech intelligibility results had a minimal effect on annoyance ratings. Below this level, people rated the background as less annoying if there was adequate speech intelligibility.
ERIC Educational Resources Information Center
Rydell, Patrick J.; Mirenda, Pat
1991-01-01
This study of 3 boys (ages 5-6) with autism found that adult high-constraint antecedent utterances elicited more verbal utterances in general, including subjects' echolalia; adult low-constraint utterances elicited more subject high-constraint utterances; and the degree of adult-utterance constraint did not influence the mean lengths of subjects'…
Chaspari, Theodora; Soldatos, Constantin; Maragos, Petros
2015-01-01
The development of ecologically valid procedures for collecting reliable and unbiased emotional data towards computer interfaces with social and affective intelligence targeting patients with mental disorders. Following its development, presented with, the Athens Emotional States Inventory (AESI) proposes the design, recording and validation of an audiovisual database for five emotional states: anger, fear, joy, sadness and neutral. The items of the AESI consist of sentences each having content indicative of the corresponding emotion. Emotional content was assessed through a survey of 40 young participants with a questionnaire following the Latin square design. The emotional sentences that were correctly identified by 85% of the participants were recorded in a soundproof room with microphones and cameras. A preliminary validation of AESI is performed through automatic emotion recognition experiments from speech. The resulting database contains 696 recorded utterances in Greek language by 20 native speakers and has a total duration of approximately 28 min. Speech classification results yield accuracy up to 75.15% for automatically recognizing the emotions in AESI. These results indicate the usefulness of our approach for collecting emotional data with reliable content, balanced across classes and with reduced environmental variability.
Ordin, Mikhail; Polyanskaya, Leona
2015-08-01
The development of speech rhythm in second language (L2) acquisition was investigated. Speech rhythm was defined as durational variability that can be captured by the interval-based rhythm metrics. These metrics were used to examine the differences in durational variability between proficiency levels in L2 English spoken by French and German learners. The results reveal that durational variability increased as L2 acquisition progressed in both groups of learners. This indicates that speech rhythm in L2 English develops from more syllable-timed toward more stress-timed patterns irrespective of whether the native language of the learner is rhythmically similar to or different from the target language. Although both groups showed similar development of speech rhythm in L2 acquisition, there were also differences: German learners achieved a degree of durational variability typical of the target language, while French learners exhibited lower variability than native British speakers, even at an advanced proficiency level.
NASA Astrophysics Data System (ADS)
Gonzalez, Julio; Oliver, Juan C.
2005-07-01
Considerable research on speech intelligibility for cochlear-implant users has been conducted using acoustic simulations with normal-hearing subjects. However, some relevant topics about perception through cochlear implants remain scantly explored. The present study examined the perception by normal-hearing subjects of gender and identity of a talker as a function of the number of channels in spectrally reduced speech. Two simulation strategies were compared. They were implemented by two different processors that presented signals as either the sum of sine waves at the center of the channels or as the sum of noise bands. In Experiment 1, 15 subjects determined the gender of 40 talkers (20 males + 20 females) from a natural utterance processed through 3, 4, 5, 6, 8, 10, 12, and 16 channels with both processors. In Experiment 2, 56 subjects matched a natural sentence uttered by 10 talkers with the corresponding simulation replicas processed through 3, 4, 8, and 16 channels for each processor. In Experiment 3, 72 subjects performed the same task but different sentences were used for natural and processed stimuli. A control Experiment 4 was conducted to equate the processing steps between the two simulation strategies. Results showed that gender and talker identification was better for the sine-wave processor, and that performance through the noise-band processor was more sensitive to the number of channels. Implications and possible explanations for the superiority of sine-wave simulations are discussed.
Longitudinal change in dysarthria associated with Friedreich ataxia: a potential clinical endpoint.
Rosen, Kristin M; Folker, Joanne E; Vogel, Adam P; Corben, Louise A; Murdoch, Bruce E; Delatycki, Martin B
2012-11-01
CNS functions that show change across short periods of time are particularly useful clinical endpoints for Friedreich ataxia. This study determined whether there is measurable acoustical change in the dysarthria associated with Friedreich ataxia across yearly intervals. A total of 29 participants diagnosed with Friedreich ataxia were recorded across 4 years at yearly intervals. A repeated measures ANOVA was used to determine which acoustic measures differed across time, and pairwise t tests were used to assess the consistency of the change across the time intervals. The relationship between the identified measures with perceptual severity was assessed with stepwise regression. Significant longitudinal change was observed with four measures that relate to the utterance duration and spectral changes in utterances. The spectral measures consistently detected change across time intervals of two or more years. The four measures combined moderately predicted perceptual severity. Together, the results implicate longitudinal change in speaking rate and utterance duration. Changes in speech associated with Friedreich ataxia can be measured across intervals of 2 years and therefore show rich potential for monitoring disease progression and therapy outcomes.
Niebuhr, Oliver
2012-01-01
The paper is concerned with the 'edge of intonation' in a twofold sense. It focuses on utterance-final F0 movements and crosses the traditional segment-prosody divide by investigating the interplay of F0 and voiceless fricatives in speech production. An experiment was performed for German with four types of voiceless fricatives: /f/, /s/, /ʃ/ and /x/. They were elicited with scripted dialogues in the contexts of terminal falling statement and high rising question intonations. Acoustic analyses show that fricatives concluding the high rising question intonations had higher mean centres of gravity (CoGs), larger CoG ranges and higher noise energy levels than fricatives concluding the terminal falling statement intonations. The different spectral-energy patterns are suitable to induce percepts of a high 'aperiodic pitch' at the end of the questions and of a low 'aperiodic pitch' at the end of the statements. The results are discussed with regard to the possible existence of 'segmental intonation' and its implication for F0 truncation and the segment-prosody dichotomy, in which segments are the alleged troublemakers for the production and perception of intonation. Copyright © 2012 S. Karger AG, Basel.
Sentence stress in children with dysarthria and cerebral palsy.
Kuschmann, Anja; Lowit, Anja
2018-03-08
This study aimed to advance our understanding of how children with dysarthria and cerebral palsy (CP) realise sentence stress acoustically, and how well listeners could identify the position of the stressed word within these utterances. Seven children with CP and eight typically developing children participated in the experiment. Stress on target words in two sentence positions was elicited through a picture-based question-answer paradigm. Acoustic parameters of stress [duration, intensity and fundamental frequency (F0)] were measured and compared between stressed and unstressed target words. For the perception experiment, ten listeners were asked to determine the position of the stressed word in the children's productions. Acoustic measures showed that at group level the typically developing children used all three acoustic parameters to mark sentence stress, whereas the children with CP showed changes in duration only. Individual performance variations were evident in both groups. Perceptually, listeners were significantly better at identifying the stressed words in the utterances produced by the typically developing children than those of the children with CP. The results suggest that children with CP can manipulate temporal speech properties to mark stress. This ability to modulate acoustic-prosodic features could be harnessed in intervention to enhance children's functional communication.
Listeners' comprehension of uptalk in spontaneous speech.
Tomlinson, John M; Fox Tree, Jean E
2011-04-01
Listeners' comprehension of phrase final rising pitch on declarative utterances, or uptalk, was examined to test the hypothesis that prolongations might differentiate conflicting functions of rising pitch. In Experiment 1 we found that listeners rated prolongations as indicating more speaker uncertainty, but that rising pitch was unrelated to ratings. In Experiment 2 we found that prolongations interacted with rising pitch when listeners monitored for words in the subsequent utterance. Words preceded by prolonged uptalk were monitored faster than words preceded by non-prolonged uptalk. In Experiment 3 we found that the interaction between rising pitch and prolongations depended on listeners' beliefs about speakers' mental states. Results support the theory that temporal and situational context are important in determining intonational meaning. Copyright © 2010 Elsevier B.V. All rights reserved.
Monaural room acoustic parameters from music and speech.
Kendrick, Paul; Cox, Trevor J; Li, Francis F; Zhang, Yonggang; Chambers, Jonathon A
2008-07-01
This paper compares two methods for extracting room acoustic parameters from reverberated speech and music. An approach which uses statistical machine learning, previously developed for speech, is extended to work with music. For speech, reverberation time estimations are within a perceptual difference limen of the true value. For music, virtually all early decay time estimations are within a difference limen of the true value. The estimation accuracy is not good enough in other cases due to differences between the simulated data set used to develop the empirical model and real rooms. The second method carries out a maximum likelihood estimation on decay phases at the end of notes or speech utterances. This paper extends the method to estimate parameters relating to the balance of early and late energies in the impulse response. For reverberation time and speech, the method provides estimations which are within the perceptual difference limen of the true value. For other parameters such as clarity, the estimations are not sufficiently accurate due to the natural reverberance of the excitation signals. Speech is a better test signal than music because of the greater periods of silence in the signal, although music is needed for low frequency measurement.
ERIC Educational Resources Information Center
Skarabela, Barbora; Ota, Mitsuhiko
2017-01-01
Children use pronouns in their speech from the earliest word combinations. Yet, it is not clear from these early utterances whether they understand that pronouns are used as substitutes for nouns and entities in the discourse. The aim of this study was to examine whether young children understand the anaphoric function of pronouns, focusing on the…
ERIC Educational Resources Information Center
rad, Shadi Khojasteh; Abdullah, Ain Nadzimah
2012-01-01
Hesitation strategies appear in speech in the form of filled or unfilled pauses, paralinguistic markers like nervous laughter or coughing, or signals which are used to justify units in the coming utterances in which the speaker struggles to produce. The main functions of these forms of hesitation strategies have been associated with speech…
ERIC Educational Resources Information Center
Barkley, Russell A.; And Others
1983-01-01
Verbal interactions of 18 hyperactive boys (8 to 11 years old) with their mothers during 15-minute free play and task periods were studied and compared to interactions of 18 normal boys with their mothers. Both hyperactive boys and their mothers were found to use significantly more utterances in free play than normal mother-child dyads.…
ERIC Educational Resources Information Center
Aguert, Marc; Laval, Virginie; Le Bigot, Ludovic; Bernicot, Josie
2010-01-01
Purpose: This study was aimed at determining the role of prosody and situational context in children's understanding of expressive utterances. Which one of these 2 cues will help children grasp the speaker's intention? Do children exhibit a "contextual bias" whereby they ignore prosody, such as the "lexical bias" found in other studies (M. Friend…
Lord, Sarah Peregrine; Can, Doğan; Yi, Michael; Marin, Rebeca; Dunn, Christopher W.; Imel, Zac E.; Georgiou, Panayiotis; Narayanan, Shrikanth; Steyvers, Mark; Atkins, David C.
2014-01-01
The current paper presents novel methods for collecting MISC data and accurately assessing reliability of behavior codes at the level of the utterance. The MISC 2.1 was used to rate MI interviews from five randomized trials targeting alcohol and drug use. Sessions were coded at the utterance-level. Utterance-based coding reliability was estimated using three methods and compared to traditional reliability estimates of session tallies. Session-level reliability was generally higher compared to reliability using utterance-based codes, suggesting that typical methods for MISC reliability may be biased. These novel methods in MI fidelity data collection and reliability assessment provided rich data for therapist feedback and further analyses. Beyond implications for fidelity coding, utterance-level coding schemes may elucidate important elements in the counselor-client interaction that could inform theories of change and the practice of MI. PMID:25242192
Lord, Sarah Peregrine; Can, Doğan; Yi, Michael; Marin, Rebeca; Dunn, Christopher W; Imel, Zac E; Georgiou, Panayiotis; Narayanan, Shrikanth; Steyvers, Mark; Atkins, David C
2015-02-01
The current paper presents novel methods for collecting MISC data and accurately assessing reliability of behavior codes at the level of the utterance. The MISC 2.1 was used to rate MI interviews from five randomized trials targeting alcohol and drug use. Sessions were coded at the utterance-level. Utterance-based coding reliability was estimated using three methods and compared to traditional reliability estimates of session tallies. Session-level reliability was generally higher compared to reliability using utterance-based codes, suggesting that typical methods for MISC reliability may be biased. These novel methods in MI fidelity data collection and reliability assessment provided rich data for therapist feedback and further analyses. Beyond implications for fidelity coding, utterance-level coding schemes may elucidate important elements in the counselor-client interaction that could inform theories of change and the practice of MI. Copyright © 2015 Elsevier Inc. All rights reserved.
Rydell, P J; Mirenda, P
1991-06-01
The effects of specific types of adult antecedent utterances (high vs. low constraint) on the verbal behaviors produced by three subjects with autism were examined. Adult utterance types were differentiated in terms of the amount of control the adults exhibited in their verbal interactions with the subjects during a free play setting. Videotaped interactions were analyzed and coded according to a predetermined categorical system. The results of this investigation suggest that the level of linguistic constraint exerted on the child interactants during naturalistic play sessions affected their communicative output. The overall findings suggest that (a) adult high constraint utterances elicited more verbal utterances in general, as well as a majority of the subjects' echolalia; (b) adult low constraint utterances elicited more subject high constraint utterances; and (c) the degree of constraint of adult utterances did not appear to influence the mean lengths of subjects' utterances. The results are discussed in terms of their implications for educational interventions, and suggestions are made for future research concerning the dynamics of echolalia in interactive contexts.
Ambrose, Sophie E; Walker, Elizabeth A; Unflat-Berry, Lauren M; Oleson, Jacob J; Moeller, Mary Pat
2015-01-01
The primary objective of this study was to examine the quantity and quality of caregiver talk directed to children who are hard of hearing (CHH) compared with children with normal hearing (CNH). For the CHH only, the study explored how caregiver input changed as a function of child age (18 months versus 3 years), which child and family factors contributed to variance in caregiver linguistic input at 18 months and 3 years, and how caregiver talk at 18 months related to child language outcomes at 3 years. Participants were 59 CNH and 156 children with bilateral, mild-to-severe hearing loss. When children were approximately 18 months and/or 3 years of age, caregivers and children participated in a 5-min semistructured, conversational interaction. Interactions were transcribed and coded for two features of caregiver input representing quantity (number of total utterances and number of total words) and four features representing quality (number of different words, mean length of utterance in morphemes, proportion of utterances that were high level, and proportion of utterances that were directing). In addition, at the 18-month visit, parents completed a standardized questionnaire regarding their child's communication development. At the 3-year visit, a clinician administered a standardized language measure. At the 18-month visit, the CHH were exposed to a greater proportion of directing utterances than the CNH. At the 3-year visit, there were significant differences between the CNH and CHH for number of total words and all four of the quality variables, with the CHH being exposed to fewer words and lower quality input. Caregivers generally provided higher quality input to CHH at the 3-year visit compared with the 18-month visit. At the 18-month visit, quantity variables, but not quality variables, were related to several child and family factors. At the 3-year visit, the variable most strongly related to caregiver input was child language. Longitudinal analyses indicated that quality, but not quantity, of caregiver linguistic input at 18 months was related to child language abilities at 3 years, with directing utterances accounting for significant unique variance in child language outcomes. Although caregivers of CHH increased their use of quality features of linguistic input over time, the differences when compared with CNH suggest that some caregivers may need additional support to provide their children with optimal language learning environments. This is particularly important given the relationships that were identified between quality features of caregivers' linguistic input and children's language abilities. Family supports should include a focus on developing a style that is conversational eliciting as opposed to directive.
EFFECTS OF TALKER GENDER ON DIALECT CATEGORIZATION
CLOPPER, CYNTHIA G.; CONREY, BRIANNA; PISONI, DAVID B.
2011-01-01
The identification of the gender of an unfamiliar talker is an easy and automatic process for naïve adult listeners. Sociolinguistic research has consistently revealed gender differences in the production of linguistic variables. Research on the perception of dialect variation, however, has been limited almost exclusively to male talkers. In the present study, naïve participants were asked to categorize unfamiliar talkers by dialect using sentence-length utterances under three presentation conditions: male talkers only, female talkers only, and a mixed gender condition. The results revealed no significant differences in categorization performance across the three presentation conditions. However, a clustering analysis of the listeners’ categorization errors revealed significant effects of talker gender on the underlying perceptual similarity spaces. The present findings suggest that naïve listeners are sensitive to gender differences in speech production and are able to use those differences to reliably categorize unfamiliar male and female talkers by dialect. PMID:21423866
Inner Speech's Relationship With Overt Speech in Poststroke Aphasia.
Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A
2017-09-18
Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p < .01) and with mean length of utterance produced during a written picture description (r = .96, p < .01). Correlations between inner speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.
Spatial Language and the Embedded Listener Model in Parents’ Input to Children
Ferrara, Katrina; Silva, Malena; Wilson, Colin; Landau, Barbara
2015-01-01
Language is a collaborative act: in order to communicate successfully, speakers must generate utterances that are not only semantically valid, but also sensitive to the knowledge state of the listener. Such sensitivity could reflect use of an “embedded listener model,” where speakers choose utterances on the basis of an internal model of the listeners’ conceptual and linguistic knowledge. In this paper, we ask whether parents’ spatial descriptions incorporate an embedded listener model that reflects their children’s understanding of spatial relations and spatial terms. Adults described the positions of targets in spatial arrays to their children or to the adult experimenter. Arrays were designed so that targets could not be identified unless spatial relationships within the array were encoded and described. Parents of 3–4 year-old children encoded relationships in ways that were well-matched to their children’s level of spatial language. These encodings differed from those of the same relationships in speech to the adult experimenter (Experiment 1). By contrast, parents of individuals with severe spatial impairments (Williams syndrome) did not show clear evidence of sensitivity to their children’s level of spatial language (Experiment 2). The results provide evidence for an embedded listener model in the domain of spatial language, and indicate conditions under which the ability to model listener knowledge may be more challenging. PMID:26717804
Spatial Language and the Embedded Listener Model in Parents' Input to Children.
Ferrara, Katrina; Silva, Malena; Wilson, Colin; Landau, Barbara
2016-11-01
Language is a collaborative act: To communicate successfully, speakers must generate utterances that are not only semantically valid but also sensitive to the knowledge state of the listener. Such sensitivity could reflect the use of an "embedded listener model," where speakers choose utterances on the basis of an internal model of the listener's conceptual and linguistic knowledge. In this study, we ask whether parents' spatial descriptions incorporate an embedded listener model that reflects their children's understanding of spatial relations and spatial terms. Adults described the positions of targets in spatial arrays to their children or to the adult experimenter. Arrays were designed so that targets could not be identified unless spatial relationships within the array were encoded and described. Parents of 3-4-year-old children encoded relationships in ways that were well-matched to their children's level of spatial language. These encodings differed from those of the same relationships in speech to the adult experimenter (Experiment 1). In contrast, parents of individuals with severe spatial impairments (Williams syndrome) did not show clear evidence of sensitivity to their children's level of spatial language (Experiment 2). The results provide evidence for an embedded listener model in the domain of spatial language and indicate conditions under which the ability to model listener knowledge may be more challenging. Copyright © 2015 Cognitive Science Society, Inc.
Ackermann; Mathiak
1999-11-01
Pure word deafness (auditory verbal agnosia) is characterized by an impairment of auditory comprehension, repetition of verbal material and writing to dictation whereas spontaneous speech production and reading largely remain unaffected. Sometimes, this syndrome is preceded by complete deafness (cortical deafness) of varying duration. Perception of vowels and suprasegmental features of verbal utterances (e.g., intonation contours) seems to be less disrupted than the processing of consonants and, therefore, might mediate residual auditory functions. Often, lip reading and/or slowing of speaking rate allow within some limits to compensate for speech comprehension deficits. Apart from a few exceptions, the available reports of pure word deafness documented a bilateral temporal lesion. In these instances, as a rule, identification of nonverbal (environmental) sounds, perception of music, temporal resolution of sequential auditory cues and/or spatial localization of acoustic events were compromised as well. The observed variable constellation of auditory signs and symptoms in central hearing disorders following bilateral temporal disorders, most probably, reflects the multitude of functional maps at the level of the auditory cortices subserving, as documented in a variety of non-human species, the encoding of specific stimulus parameters each. Thus, verbal/nonverbal auditory agnosia may be considered a paradigm of distorted "auditory scene analysis" (Bregman 1990) affecting both primitive and schema-based perceptual processes. It cannot be excluded, however, that disconnection of the Wernicke-area from auditory input (Geschwind 1965) and/or an impairment of suggested "phonetic module" (Liberman 1996) contribute to the observed deficits as well. Conceivably, these latter mechanisms underly the rare cases of pure word deafness following a lesion restricted to the dominant hemisphere. Only few instances of a rather isolated disruption of the discrimination/identification of nonverbal sound sources, in the presence of uncompromised speech comprehension, have been reported so far (nonverbal auditory agnosia). As a rule, unilateral right-sided damage has been found to be the relevant lesion.
Preschoolers' real-time coordination of vocal and facial emotional information.
Berman, Jared M J; Chambers, Craig G; Graham, Susan A
2016-02-01
An eye-tracking methodology was used to examine the time course of 3- and 5-year-olds' ability to link speech bearing different acoustic cues to emotion (i.e., happy-sounding, neutral, and sad-sounding intonation) to photographs of faces reflecting different emotional expressions. Analyses of saccadic eye movement patterns indicated that, for both 3- and 5-year-olds, sad-sounding speech triggered gaze shifts to a matching (sad-looking) face from the earliest moments of speech processing. However, it was not until approximately 800ms into a happy-sounding utterance that preschoolers began to use the emotional cues from speech to identify a matching (happy-looking) face. Complementary analyses based on conscious/controlled behaviors (children's explicit points toward the faces) indicated that 5-year-olds, but not 3-year-olds, could successfully match happy-sounding and sad-sounding vocal affect to a corresponding emotional face. Together, the findings clarify developmental patterns in preschoolers' implicit versus explicit ability to coordinate emotional cues across modalities and highlight preschoolers' greater sensitivity to sad-sounding speech as the auditory signal unfolds in time. Copyright © 2015 Elsevier Inc. All rights reserved.
Speech to Text Translation for Malay Language
NASA Astrophysics Data System (ADS)
Al-khulaidi, Rami Ali; Akmeliawati, Rini
2017-11-01
The speech recognition system is a front end and a back-end process that receives an audio signal uttered by a speaker and converts it into a text transcription. The speech system can be used in several fields including: therapeutic technology, education, social robotics and computer entertainments. In most cases in control tasks, which is the purpose of proposing our system, wherein the speed of performance and response concern as the system should integrate with other controlling platforms such as in voiced controlled robots. Therefore, the need for flexible platforms, that can be easily edited to jibe with functionality of the surroundings, came to the scene; unlike other software programs that require recording audios and multiple training for every entry such as MATLAB and Phoenix. In this paper, a speech recognition system for Malay language is implemented using Microsoft Visual Studio C#. 90 (ninety) Malay phrases were tested by 10 (ten) speakers from both genders in different contexts. The result shows that the overall accuracy (calculated from Confusion Matrix) is satisfactory as it is 92.69%.
Neural correlates of audiovisual speech processing in a second language.
Barrós-Loscertales, Alfonso; Ventura-Campos, Noelia; Visser, Maya; Alsius, Agnès; Pallier, Christophe; Avila Rivera, César; Soto-Faraco, Salvador
2013-09-01
Neuroimaging studies of audiovisual speech processing have exclusively addressed listeners' native language (L1). Yet, several behavioural studies now show that AV processing plays an important role in non-native (L2) speech perception. The current fMRI study measured brain activity during auditory, visual, audiovisual congruent and audiovisual incongruent utterances in L1 and L2. BOLD responses to congruent AV speech in the pSTS were stronger than in either unimodal condition in both L1 and L2. Yet no differences in AV processing were expressed according to the language background in this area. Instead, the regions in the bilateral occipital lobe had a stronger congruency effect on the BOLD response (congruent higher than incongruent) in L2 as compared to L1. According to these results, language background differences are predominantly expressed in these unimodal regions, whereas the pSTS is similarly involved in AV integration regardless of language dominance. Copyright © 2013 Elsevier Inc. All rights reserved.
Sound and speech detection and classification in a Health Smart Home.
Fleury, A; Noury, N; Vacher, M; Glasson, H; Seri, J F
2008-01-01
Improvements in medicine increase life expectancy in the world and create a new bottleneck at the entrance of specialized and equipped institutions. To allow elderly people to stay at home, researchers work on ways to monitor them in their own environment, with non-invasive sensors. To meet this goal, smart homes, equipped with lots of sensors, deliver information on the activities of the person and can help detect distress situations. In this paper, we present a global speech and sound recognition system that can be set-up in a flat. We placed eight microphones in the Health Smart Home of Grenoble (a real living flat of 47m(2)) and we automatically analyze and sort out the different sounds recorded in the flat and the speech uttered (to detect normal or distress french sentences). We introduce the methods for the sound and speech recognition, the post-processing of the data and finally the experimental results obtained in real conditions in the flat.
The effects of complementary and alternative medicine on the speech of patients with depression
NASA Astrophysics Data System (ADS)
Fraas, Michael; Solloway, Michele
2004-05-01
It is well documented that patients suffering from depression exhibit articulatory timing deficits and speech that is monotonous and lacking pitch variation. Traditional remediation of depression has left many patients with adverse side effects and ineffective outcomes. Recent studies indicate that many Americans are seeking complementary and alternative forms of medicine to supplement traditional therapy approaches. The current investigation wishes to determine the efficacy of complementary and alternative medicine (CAM) on the remediation of speech deficits associated with depression. Subjects with depression and normal controls will participate in an 8-week treatment session using polarity therapy, a form of CAM. Subjects will be recorded producing a series of spontaneous and narrative speech samples. Acoustic analysis of mean fundamental frequency (F0), variation in F0 (standard deviation of F0), average rate of F0 change, and pause and utterance durations will be conducted. Differences pre- and post-CAM therapy between subjects with depression and normal controls will be discussed.
ERIC Educational Resources Information Center
Vihman, Marilyn May
The use of formulaic speech is seen as a learning strategy in children's first language (L1) acquisition to a limited extent, and to an even greater extent in their second language (L2) acquisition. While the first utterances of the child learning L1 are mostly one-word constructions, many of them are routine words or phrases that the child learns…
Reinthal, Ann Karas; Mansour, Linda Moeller; Greenwald, Glenna
2004-01-01
This case study examined the effectiveness of a programme designed to improve anticipatory postural control in an adolescent over years 2 and 3 post-traumatic brain injury (TBI). It was hypothesized that her difficulty in walking and talking simultaneously was caused by excessive co-activation of extremity, trunk, and oral musculature during upright activities. The participant was treated weekly by physical and speech therapy. Treatment focussed on improving anticipatory postural control during gross motor activities in conjunction with oral-motor function. Initially, the participant walked using a walker at a speed of 23 cm s(-1). Two years later, she could walk without a device at 53 cm s(-1). Initial laryngoscopic examination showed minimal movement of the velum or pharyngeal walls; full movement was present after treatment. The measure of intelligibility improved from no single word intelligible utterances to 85% intelligible utterances after 2 years. The results suggest that less compensatory rigidification of oral musculature was needed to maintain an upright position against gravity as postural control improved. An adolescent 1-year post-TBI was followed as she underwent additional rehabilitation focussed on improving anticipatory postural control. The functional goal of simultaneously talking while walking was achieved through this intervention.
Angelopoulou, Georgia; Kasselimis, Dimitrios; Makrydakis, George; Varkanitsa, Maria; Roussos, Petros; Goutsos, Dionysis; Evdokimidis, Ioannis; Potagas, Constantin
2018-06-01
Pauses may be studied as an aspect of the temporal organization of speech, as well as an index of internal cognitive processes, such as word access, selection and retrieval, monitoring, articulatory planning, and memory. Several studies have demonstrated specific distributional patterns of pauses in typical speech. However, evidence from patients with language impairment is sparse and restricted to small-scale studies. The aim of the present study is to investigate empty pause distribution and associations between pause variables and linguistic elements in aphasia. Eighteen patients with chronic aphasia following a left hemisphere stroke were recruited. The control group consisted of 19 healthy adults matched for age, gender, and years of formal schooling. Speech samples from both groups were transcribed, and silent pauses were annotated using ELAN. Our results indicate that in both groups, pause duration distribution follows a log-normal bimodal model with significantly different thresholds between the two populations, yet specific enough for each distribution to justify classification into two distinct groups of pauses for each population: short and long. Moreover, we found differences between the patient and control group, prominently with regard to long pause duration and rate. Long pause indices were also associated with fundamental linguistics elements, such as mean length of utterance. Overall, we argue that post-stroke aphasia may induce quantitative but not qualitative alterations of pause patterns during speech, and further suggest that long pauses may serve as an index of internal cognitive processes supporting sentence planning. Our findings are discussed within the context of pause pattern quantification strategies as potential markers of cognitive changes in aphasia, further stressing the importance of such measures as an integral part of language assessment in clinical populations. Copyright © 2018 Elsevier Ltd. All rights reserved.
Distinct developmental profiles in typical speech acquisition
Campbell, Thomas F.; Shriberg, Lawrence D.; Green, Jordan R.; Abdi, Hervé; Rusiewicz, Heather Leavy; Venkatesh, Lakshmi; Moore, Christopher A.
2012-01-01
Three- to five-year-old children produce speech that is characterized by a high level of variability within and across individuals. This variability, which is manifest in speech movements, acoustics, and overt behaviors, can be input to subgroup discovery methods to identify cohesive subgroups of speakers or to reveal distinct developmental pathways or profiles. This investigation characterized three distinct groups of typically developing children and provided normative benchmarks for speech development. These speech development profiles, identified among 63 typically developing preschool-aged speakers (ages 36–59 mo), were derived from the children's performance on multiple measures. These profiles were obtained by submitting to a k-means cluster analysis of 72 measures that composed three levels of speech analysis: behavioral (e.g., task accuracy, percentage of consonants correct), acoustic (e.g., syllable duration, syllable stress), and kinematic (e.g., variability of movements of the upper lip, lower lip, and jaw). Two of the discovered group profiles were distinguished by measures of variability but not by phonemic accuracy; the third group of children was characterized by their relatively low phonemic accuracy but not by an increase in measures of variability. Analyses revealed that of the original 72 measures, 8 key measures were sufficient to best distinguish the 3 profile groups. PMID:22357794
ERIC Educational Resources Information Center
Rice, Mabel L.; Smolik, Filip; Perpich, Denise; Thompson, Travis; Rytting, Nathan; Blossom, Megan
2010-01-01
Purpose: The mean length of children's utterances is a valuable estimate of their early language acquisition. The available normative data lack documentation of language and nonverbal intelligence levels of the samples. This study reports age-referenced mean length of utterance (MLU) data from children with specific language impairment (SLI) and…
Robust audio-visual speech recognition under noisy audio-video conditions.
Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji
2014-02-01
This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Language discrimination without language: Experiments on tamarin monkeys
NASA Astrophysics Data System (ADS)
Tincoff, Ruth; Hauser, Marc; Spaepen, Geertrui; Tsao, Fritz; Mehler, Jacques
2002-05-01
Human newborns can discriminate spoken languages differing on prosodic characteristics such as the timing of rhythmic units [T. Nazzi et al., JEP:HPP 24, 756-766 (1998)]. Cotton-top tamarins have also demonstrated a similar ability to discriminate a morae- (Japanese) vs a stress-timed (Dutch) language [F. Ramus et al., Science 288, 349-351 (2000)]. The finding that tamarins succeed in this task when either natural or synthesized utterances are played in a forward direction, but fail on backward utterances which disrupt the rhythmic cues, suggests that sensitivity to language rhythm may rely on general processes of the primate auditory system. However, the rhythm hypothesis also predicts that tamarins would fail to discriminate languages from the same rhythm class, such as English and Dutch. To assess the robustness of this ability, tamarins were tested on a different-rhythm-class distinction, Polish vs Japanese, and a new same-rhythm-class distinction, English vs Dutch. The stimuli were natural forward utterances produced by multiple speakers. As predicted by the rhythm hypothesis, tamarins discriminated between Polish and Japanese, but not English and Dutch. These findings strengthen the claim that discriminating the rhythmic cues of language does not require mechanisms specialized for human speech. [Work supported by NSF.
Referential communication abilities in children with 22q11.2 deletion syndrome.
Van Den Heuvel, Ellen; ReuterskiöLd, Christina; Solot, Cynthia; Manders, Eric; Swillen, Ann; Zink, Inge
2017-10-01
This study describes the performance on a perspective- and role-taking task in 27 children, ages 6-13 years, with 22q11.2 deletion syndrome (22q11.2DS). A cross-cultural design comparing Dutch- and English-speaking children with 22q11.2DS explored the possibility of cultural differences. Chronologically age-matched and younger typically developing (TD) children matched for receptive vocabulary served as control groups to identify challenges in referential communication. The utterances of children with 22q11.2DS were characterised as short and simple in lexical and grammatical terms. However, from a language use perspective, their utterances were verbose, ambiguous and irrelevant given the pictured scenes. They tended to elaborate on visual details and conveyed off-topic, extraneous information when participating in a barrier-game procedure. Both types of aberrant utterances forced a listener to consistently infer the intended message. Moreover, children with 22q11.2DS demonstrated difficulty selecting correct speech acts in accordance with contextual cues during a role-taking task. Both English- and Dutch-speaking children with 22q11.2DS showed impoverished information transfer and an increased number of elaborations, suggesting a cross-cultural syndrome-specific feature.
Barthel, Mathias; Sauppe, Sebastian; Levinson, Stephen C.; Meyer, Antje S.
2016-01-01
In conversation, interlocutors rarely leave long gaps between turns, suggesting that next speakers begin to plan their turns while listening to the previous speaker. The present experiment used analyses of speech onset latencies and eye-movements in a task-oriented dialogue paradigm to investigate when speakers start planning their responses. German speakers heard a confederate describe sets of objects in utterances that either ended in a noun [e.g., Ich habe eine Tür und ein Fahrrad (“I have a door and a bicycle”)] or a verb form [e.g., Ich habe eine Tür und ein Fahrrad besorgt (“I have gotten a door and a bicycle”)], while the presence or absence of the final verb either was or was not predictable from the preceding sentence structure. In response, participants had to name any unnamed objects they could see in their own displays with utterances such as Ich habe ein Ei (“I have an egg”). The results show that speakers begin to plan their turns as soon as sufficient information is available to do so, irrespective of further incoming words. PMID:27990127
Speech task effects on acoustic measure of fundamental frequency in Cantonese-speaking children.
Ma, Estella P-M; Lam, Nina L-N
2015-12-01
Speaking fundamental frequency (F0) is a voice measure frequently used to document changes in vocal performance over time. Knowing the intra-subject variability of speaking F0 has implications on its clinical usefulness. The present study examined the speaking F0 elicited from three speech tasks in Cantonese-speaking children. The study also compared the variability of speaking F0 elicited from different speech tasks. Fifty-six vocally healthy Cantonese-speaking children (31 boys and 25 girls) aged between 7.0 and 10.11 years participated. For each child, speaking F0 was elicited using speech tasks at three linguistic levels (sustained vowel /a/ prolongation, reading aloud a sentence and passage). Two types of variability, within-session (trial-to-trial) and across-session (test-retest) variability, were compared across speech tasks. Significant differences in mean speaking F0 values were found between speech tasks. Mean speaking F0 value elicited from sustained vowel phonations was significantly higher than those elicited from the connected speech tasks. The variability of speaking F0 was higher in sustained vowel prolongation than that in connected speech. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Hongcui; Kawahara, Tatsuya
CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.
Preliminary comparison of infants speech with and without hearing loss
NASA Astrophysics Data System (ADS)
McGowan, Richard S.; Nittrouer, Susan; Chenausky, Karen
2005-04-01
The speech of ten children with hearing loss and ten children without hearing loss aged 12 months is examined. All the children with hearing loss were identified before six months of age, and all have parents who wish them to become oral communicators. The data are from twenty minute sessions with the caregiver and child, with their normal prostheses in place, in semi-structured settings. These data are part of a larger test battery applied to both caregiver and child that is part of a project comparing the development of children with hearing loss to those without hearing loss, known as the Early Development of Children with Hearing Loss. The speech comparisons are in terms of number of utterances, syllable shapes, and segment type. A subset of the data was given a detailed acoustic analysis, including formant frequencies and voice quality measures. [Work supported by NIDCD R01 006237 to Susan Nittrouer.
Frey, Jennifer R; Kaiser, Ann P; Scherer, Nancy J
2018-02-01
The purpose of this study was to investigate the influences of child speech intelligibility and rate on caregivers' linguistic responses. This study compared the language use of children with cleft palate with or without cleft lip (CP±L) and their caregivers' responses. Descriptive analyses of children's language and caregivers' responses and a multilevel analysis of caregiver responsivity were conducted to determine whether there were differences in children's productive language and caregivers' responses to different types of child utterances. Play-based caregiver-child interactions were video recorded in a clinic setting. Thirty-eight children (19 toddlers with nonsyndromic repaired CP±L and 19 toddlers with typical language development) between 17 and 37 months old and their primary caregivers participated. Child and caregiver measures were obtained from transcribed and coded video recordings and included the rate, total number of words, and number of different words spoken by children and their caregivers, intelligibility of child utterances, and form of caregiver responses. Findings from this study suggest caregivers are highly responsive to toddlers' communication attempts, regardless of the intelligibility of those utterances. However, opportunities to respond were fewer for children with CP±L. Significant differences were observed in children's intelligibility and productive language and in caregivers' use of questions in response to unintelligible utterances of children with and without CP±L. This study provides information about differences in children with CP±L's language use and caregivers' responses to spoken language of toddlers with and without CP±L.
Intonation and dialog context as constraints for speech recognition.
Taylor, P; King, S; Isard, S; Wright, H
1998-01-01
This paper describes a way of using intonation and dialog context to improve the performance of an automatic speech recognition (ASR) system. Our experiments were run on the DCIEM Maptask corpus, a corpus of spontaneous task-oriented dialog speech. This corpus has been tagged according to a dialog analysis scheme that assigns each utterance to one of 12 "move types," such as "acknowledge," "query-yes/no" or "instruct." Most ASR systems use a bigram language model to constrain the possible sequences of words that might be recognized. Here we use a separate bigram language model for each move type. We show that when the "correct" move-specific language model is used for each utterance in the test set, the word error rate of the recognizer drops. Of course when the recognizer is run on previously unseen data, it cannot know in advance what move type the speaker has just produced. To determine the move type we use an intonation model combined with a dialog model that puts constraints on possible sequences of move types, as well as the speech recognizer likelihoods for the different move-specific models. In the full recognition system, the combination of automatic move type recognition with the move specific language models reduces the overall word error rate by a small but significant amount when compared with a baseline system that does not take intonation or dialog acts into account. Interestingly, the word error improvement is restricted to "initiating" move types, where word recognition is important. In "response" move types, where the important information is conveyed by the move type itself--for example, positive versus negative response--there is no word error improvement, but recognition of the response types themselves is good. The paper discusses the intonation model, the language models, and the dialog model in detail and describes the architecture in which they are combined.
Crossmodal and incremental perception of audiovisual cues to emotional speech.
Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc
2010-01-01
In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: 1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests with video clips of emotional utterances collected via a variant of the well-known Velten method. More specifically, we recorded speakers who displayed positive or negative emotions, which were congruent or incongruent with the (emotional) lexical content of the uttered sentence. In order to test this, we conducted two experiments. The first experiment is a perception experiment in which Czech participants, who do not speak Dutch, rate the perceived emotional state of Dutch speakers in a bimodal (audiovisual) or a unimodal (audio- or vision-only) condition. It was found that incongruent emotional speech leads to significantly more extreme perceived emotion scores than congruent emotional speech, where the difference between congruent and incongruent emotional speech is larger for the negative than for the positive conditions. Interestingly, the largest overall differences between congruent and incongruent emotions were found for the audio-only condition, which suggests that posing an incongruent emotion has a particularly strong effect on the spoken realization of emotions. The second experiment uses a gating paradigm to test the recognition speed for various emotional expressions from a speaker's face. In this experiment participants were presented with the same clips as experiment I, but this time presented vision-only. The clips were shown in successive segments (gates) of increasing duration. Results show that participants are surprisingly accurate in their recognition of the various emotions, as they already reach high recognition scores in the first gate (after only 160 ms). Interestingly, the recognition scores raise faster for positive than negative conditions. Finally, the gating results suggest that incongruent emotions are perceived as more intense than congruent emotions, as the former get more extreme recognition scores than the latter, already after a short period of exposure.
de Boer, J N; Heringa, S M; van Dellen, E; Wijnen, F N K; Sommer, I E C
2016-11-01
Auditory verbal hallucinations (AVH) in psychotic patients are associated with activation of right hemisphere language areas, although this hemisphere is non-dominant in most people. Language generated in the right hemisphere can be observed in aphasia patients with left hemisphere damage. It is called "automatic speech", characterized by low syntactic complexity and negative emotional valence. AVH in nonpsychotic individuals, by contrast, predominantly have a neutral or positive emotional content and may be less dependent on right hemisphere activity. We hypothesize that right hemisphere language characteristics can be observed in the language of AVH, differentiating psychotic from nonpsychotic individuals. 17 patients with a psychotic disorder and 19 nonpsychotic individuals were instructed to repeat their AVH verbatim directly upon hearing them. Responses were recorded, transcribed and analyzed for total words, mean length of utterance, proportion of grammatical utterances, proportion of negations, literal and thematic perseverations, abuses, type-token ratio, embeddings, verb complexity, noun-verb ratio, and open-closed class ratio. Linguistic features of AVH overall differed between groups F(13,24)=3.920, p=0.002; Pillai's Trace 0.680. AVH of psychotic patients compared with AVH of nonpsychotic individuals had a shorter mean length of utterance, lower verb complexity, and more verbal abuses and perseverations (all p<0.05). Other features were similar between groups. AVH of psychotic patients showed lower syntactic complexity and higher levels of repetition and abuses than AVH of nonpsychotic individuals. These differences are in line with a stronger involvement of the right hemisphere in the origination of AVH in patients than in nonpsychotic voice hearers. Copyright © 2016 Elsevier Inc. All rights reserved.
Vatakis, Argiro; Maragos, Petros; Rodomagoulakis, Isidoros; Spence, Charles
2012-01-01
We investigated how the physical differences associated with the articulation of speech affect the temporal aspects of audiovisual speech perception. Video clips of consonants and vowels uttered by three different speakers were presented. The video clips were analyzed using an auditory-visual signal saliency model in order to compare signal saliency and behavioral data. Participants made temporal order judgments (TOJs) regarding which speech-stream (auditory or visual) had been presented first. The sensitivity of participants' TOJs and the point of subjective simultaneity (PSS) were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. We expected that in the case of the place of articulation and roundedness, where the visual-speech signal is more salient, temporal perception of speech would be modulated by the visual-speech signal. No such effect was expected for the manner of articulation or height. The results demonstrate that for place and manner of articulation, participants' temporal percept was affected (although not always significantly) by highly-salient speech-signals with the visual-signals requiring smaller visual-leads at the PSS. This was not the case when height was evaluated. These findings suggest that in the case of audiovisual speech perception, a highly salient visual-speech signal may lead to higher probabilities regarding the identity of the auditory-signal that modulate the temporal window of multisensory integration of the speech-stimulus. PMID:23060756
ERIC Educational Resources Information Center
Vanormelingen, Liesbeth; Gillis, Steven
2016-01-01
This article investigates the amount of input and the quality of mother-child interactions in mothers who differ in socio-economic status (SES): mid-to-high SES (mhSES) and low SES. The amount of input was measured as the number of utterances per hour, the total duration of speech per hour and the number of turns per hour. The quality of the…
Emotion recognition from speech: tools and challenges
NASA Astrophysics Data System (ADS)
Al-Talabani, Abdulbasit; Sellahewa, Harin; Jassim, Sabah A.
2015-05-01
Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion related information lies in the speech signal on the other side. These diversities motivate our investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. We shall demonstrate that our scheme outperform the state of the art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular we shall argue that emotion recognition from speech should not be dealt with as a classification problem. We shall demonstrate the presence of a spectrum of different emotions in the same speech portion especially in the non-prompted data sets, which tends to be more "natural" than the acted datasets where the subjects attempt to suppress all but one emotion.
Prediction and constraint in audiovisual speech perception
Peelle, Jonathan E.; Sommers, Mitchell S.
2015-01-01
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. PMID:25890390
Ganz, Jennifer B; Simpson, Richard L
2004-08-01
Few studies on augmentative and alternative communication (AAC) systems have addressed the potential for such systems to impact word utterances in children with autism spectrum disorders (ASD). The Picture Exchange Communication System (PECS) is an AAC system designed specifically to minimize difficulties with communication skills experienced by individuals with ASD. The current study examined the role of PECS in improving the number of words spoken, increasing the complexity and length of phrases, and decreasing the non-word vocalizations of three young children with ASD and developmental delays (DD) with related characteristics. Participants were taught Phases 1-4 of PECS (i.e., picture exchange, increased distance, picture discrimination, and sentence construction). The results indicated that PECS was mastered rapidly by the participants and word utterances increased in number of words and complexity of grammar.
ERIC Educational Resources Information Center
Maynard, Senko K.
The casual conversation of six pairs of Japanese and six pairs of American colleges students was analyzed for evidence of two related aspects of conversation management: the linguistic characteristics of utterance units and back-channel strategies. Utterance units are defined as those occurring between identifiable pauses or breaks in tempo.…
Lexical frequency effects on articulation: a comparison of picture naming and reading aloud
Mousikou, Petroula; Rastle, Kathleen
2015-01-01
The present study investigated whether lexical frequency, a variable that is known to affect the time taken to utter a verbal response, may also influence articulation. Pairs of words that differed in terms of their relative frequency, but were matched on their onset, vowel, and number of phonemes (e.g., map vs. mat, where the former is more frequent than the latter) were used in a picture naming and a reading aloud task. Low-frequency items yielded slower response latencies than high-frequency items in both tasks, with the frequency effect being significantly larger in picture naming compared to reading aloud. Also, initial-phoneme durations were longer for low-frequency items than for high-frequency items. The frequency effect on initial-phoneme durations was slightly more prominent in picture naming than in reading aloud, yet its size was very small, thus preventing us from concluding that lexical frequency exerts an influence on articulation. Additionally, initial-phoneme and whole-word durations were significantly longer in reading aloud compared to picture naming. We discuss our findings in the context of current theories of reading aloud and speech production, and the approaches they adopt in relation to the nature of information flow (staged vs. cascaded) between cognitive and articulatory levels of processing. PMID:26528223
Szagun, Gisela; Stumper, Barbara
2012-12-01
The authors investigated the influence of social environmental variables and age at implantation on language development in children with cochlear implants. Participants were 25 children with cochlear implants and their parents. Age at implantation ranged from 6 months to 42 months ( M (age) = 20.4 months, SD = 22.0 months). Linguistic progress was assessed at 12, 18, 24, and 30 months after implantation. At each data point, language measures were based on parental questionnaire and 45-min spontaneous speech samples. Children's language and parents' child-directed language were analyzed. On all language measures, children displayed considerable vocabulary and grammatical growth over time. Although there was no overall effect of age at implantation, younger and older children had different growth patterns. Children implanted by age 24 months made the most marked progress earlier on, whereas children implanted thereafter did so later on. Higher levels of maternal education were associated with faster linguistic progress; age at implantation was not. Properties of maternal language input, mean length of utterance, and expansions were associated with children's linguistic progress independently of age at implantation. In children implanted within the sensitive period for language learning, children's home language environment contributes more crucially to their linguistic progress than does age at implantation.
Almirall, Daniel; DiStefano, Charlotte; Chang, Ya-Chih; Shire, Stephanie; Kaiser, Ann; Lu, Xi; Nahum-Shani, Inbal; Landa, Rebecca; Mathy, Pamela; Kasari, Connie
2016-01-01
Objective There are limited data on the effects of adaptive social communication interventions with a speech-generating device in autism. This study is the first to compare growth in communications outcomes among three adaptive interventions in school-aged children with autism spectrum disorder (ASD) who are minimally verbal. Methods Sixty-one children, aged 5–8 years participated in a sequential, multiple-assignment randomized trial (SMART). All children received a developmental communication intervention: joint attention, symbolic play, engagement and regulation (JASP) with enhanced milieu teaching (EMT). The SMART included three two-stage, 24-week adaptive interventions with different provisions of a speech-generating device (SGD) in the context of JASP+EMT. The first adaptive intervention, with no SGD, initially assigned JASP+EMT alone; then intensified JASP+EMT for slow responders. In the second adaptive intervention, slow responders to JASP+EMT were assigned JASP+EMT+SGD. The third adaptive intervention initially assigned JASP+EMT+SGD; then intensified JASP+EMT+SGD for slow responders. Analyses examined between-group differences in change in outcomes from baseline to week 36. Verbal outcomes included spontaneous communicative utterances and novel words. Non-linguistic communication outcomes included initiating joint attention and behavior regulation, and play. Results The adaptive intervention beginning with JASP+EMT+SGD was estimated as superior. There were significant (P<0.05) between-group differences in change in spontaneous communicative utterances and initiating joint attention. Conclusions School-aged children with ASD who are minimally verbal make significant gains in communication outcomes with an adaptive intervention beginning with JASP+EMT+SGD. Future research should explore mediators and moderators of the adaptive intervention effects and second-stage intervention options that further capitalize on early gains in treatment. PMID:26954267
Lateralized electrical brain activity reveals covert attention allocation during speaking.
Rommers, Joost; Meyer, Antje S; Praamstra, Peter
2017-01-27
Speakers usually begin to speak while only part of the utterance has been planned. Earlier work has shown that speech planning processes are reflected in speakers' eye movements as they describe visually presented objects. However, to-be-named objects can be processed to some extent before they have been fixated upon, presumably because attention can be allocated to objects covertly, without moving the eyes. The present study investigated whether EEG could track speakers' covert attention allocation as they produced short utterances to describe pairs of objects (e.g., "dog and chair"). The processing difficulty of each object was varied by presenting it in upright orientation (easy) or in upside down orientation (difficult). Background squares flickered at different frequencies in order to elicit steady-state visual evoked potentials (SSVEPs). The N2pc component, associated with the focusing of attention on an item, was detectable not only prior to speech onset, but also during speaking. The time course of the N2pc showed that attention shifted to each object in the order of mention prior to speech onset. Furthermore, greater processing difficulty increased the time speakers spent attending to each object. This demonstrates that the N2pc can track covert attention allocation in a naming task. In addition, an effect of processing difficulty at around 200-350ms after stimulus onset revealed early attention allocation to the second to-be-named object. The flickering backgrounds elicited SSVEPs, but SSVEP amplitude was not influenced by processing difficulty. These results help complete the picture of the coordination of visual information uptake and motor output during speaking. Copyright © 2016 Elsevier Ltd. All rights reserved.
Automatic measurement and representation of prosodic features
NASA Astrophysics Data System (ADS)
Ying, Goangshiuan Shawn
Effective measurement and representation of prosodic features of the acoustic signal for use in automatic speech recognition and understanding systems is the goal of this work. Prosodic features-stress, duration, and intonation-are variations of the acoustic signal whose domains are beyond the boundaries of each individual phonetic segment. Listeners perceive prosodic features through a complex combination of acoustic correlates such as intensity, duration, and fundamental frequency (F0). We have developed new tools to measure F0 and intensity features. We apply a probabilistic global error correction routine to an Average Magnitude Difference Function (AMDF) pitch detector. A new short-term frequency-domain Teager energy algorithm is used to measure the energy of a speech signal. We have conducted a series of experiments performing lexical stress detection on words in continuous English speech from two speech corpora. We have experimented with two different approaches, a segment-based approach and a rhythm unit-based approach, in lexical stress detection. The first approach uses pattern recognition with energy- and duration-based measurements as features to build Bayesian classifiers to detect the stress level of a vowel segment. In the second approach we define rhythm unit and use only the F0-based measurement and a scoring system to determine the stressed segment in the rhythm unit. A duration-based segmentation routine was developed to break polysyllabic words into rhythm units. The long-term goal of this work is to develop a system that can effectively detect the stress pattern for each word in continuous speech utterances. Stress information will be integrated as a constraint for pruning the word hypotheses in a word recognition system based on hidden Markov models.
Iconic hand gestures and the predictability of words in context in spontaneous speech.
Beattie, G; Shovelton, H
2000-11-01
This study presents a series of empirical investigations to test a theory of speech production proposed by Butterworth and Hadar (1989; revised in Hadar & Butterworth, 1997) that iconic gestures have a functional role in lexical retrieval in spontaneous speech. Analysis 1 demonstrated that words which were totally unpredictable (as measured by the Shannon guessing technique) were more likely to occur after pauses than after fluent speech, in line with earlier findings. Analysis 2 demonstrated that iconic gestures were associated with words of lower transitional probability than words not associated with gesture, even when grammatical category was controlled. This therefore provided new supporting evidence for Butterworth and Hadar's claims that gestures' lexical affiliates are indeed unpredictable lexical items. However, Analysis 3 found that iconic gestures were not occasioned by lexical accessing difficulties because although gestures tended to occur with words of significantly lower transitional probability, these lower transitional probability words tended to be uttered quite fluently. Overall, therefore, this study provided little evidence for Butterworth and Hadar's theoretical claim that the main function of the iconic hand gestures that accompany spontaneous speech is to assist in the process of lexical access. Instead, such gestures are reconceptualized in terms of communicative function.
A Brain for Speech. Evolutionary Continuity in Primate and Human Auditory-Vocal Processing
Aboitiz, Francisco
2018-01-01
In this review article, I propose a continuous evolution from the auditory-vocal apparatus and its mechanisms of neural control in non-human primates, to the peripheral organs and the neural control of human speech. Although there is an overall conservatism both in peripheral systems and in central neural circuits, a few changes were critical for the expansion of vocal plasticity and the elaboration of proto-speech in early humans. Two of the most relevant changes were the acquisition of direct cortical control of the vocal fold musculature and the consolidation of an auditory-vocal articulatory circuit, encompassing auditory areas in the temporoparietal junction and prefrontal and motor areas in the frontal cortex. This articulatory loop, also referred to as the phonological loop, enhanced vocal working memory capacity, enabling early humans to learn increasingly complex utterances. The auditory-vocal circuit became progressively coupled to multimodal systems conveying information about objects and events, which gradually led to the acquisition of modern speech. Gestural communication accompanies the development of vocal communication since very early in human evolution, and although both systems co-evolved tightly in the beginning, at some point speech became the main channel of communication. PMID:29636657
Implementation of the Intelligent Voice System for Kazakh
NASA Astrophysics Data System (ADS)
Yessenbayev, Zh; Saparkhojayev, N.; Tibeyev, T.
2014-04-01
Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.
Wagner, Valentin; Jescheniak, Jörg D; Schriefers, Herbert
2010-03-01
Three picture-word interference experiments addressed the question of whether the scope of grammatical advance planning in sentence production corresponds to some fixed unit or rather is flexible. Subjects produced sentences of different formats under varying amounts of cognitive load. When speakers described 2-object displays with simple sentences of the form "the frog is next to the mug," the 2 nouns were found to be lexically-semantically activated to similar degrees at speech onset, as indexed by similarly sized interference effects from semantic distractors related to either the first or the second noun. When speakers used more complex sentences (including prenominal color adjectives; e.g., "the blue frog is next to the blue mug") much larger interference effects were observed for the first than the second noun, suggesting that the second noun was lexically-semantically activated before speech onset on only a subset of trials. With increased cognitive load, introduced by an additional conceptual decision task and variable utterance formats, the interference effect for the first noun was increased and the interference effect for second noun disappeared, suggesting that the scope of advance planning had been narrowed. By contrast, if cognitive load was induced by a secondary working memory task to be performed during speech planning, the interference effect for both nouns was increased, suggesting that the scope of advance planning had not been affected. In all, the data suggest that the scope of advance planning during grammatical encoding in sentence production is flexible, rather than structurally fixed.
Cortical activity during cued picture naming predicts individual differences in stuttering frequency
Mock, Jeffrey R.; Foundas, Anne L.; Golob, Edward J.
2016-01-01
Objective Developmental stuttering is characterized by fluent speech punctuated by stuttering events, the frequency of which varies among individuals and contexts. Most stuttering events occur at the beginning of an utterance, suggesting neural dynamics associated with stuttering may be evident during speech preparation. Methods This study used EEG to measure cortical activity during speech preparation in men who stutter, and compared the EEG measures to individual differences in stuttering rate as well as to a fluent control group. Each trial contained a cue followed by an acoustic probe at one of two onset times (early or late), and then a picture. There were two conditions: a speech condition where cues induced speech preparation of the picture’s name and a control condition that minimized speech preparation. Results Across conditions stuttering frequency correlated to cue-related EEG beta power and auditory ERP slow waves from early onset acoustic probes. Conclusions The findings reveal two new cortical markers of stuttering frequency that were present in both conditions, manifest at different times, are elicited by different stimuli (visual cue, auditory probe), and have different EEG responses (beta power, ERP slow wave). Significance The cue-target paradigm evoked brain responses that correlated to pre-experimental stuttering rate. PMID:27472545
Mock, Jeffrey R; Foundas, Anne L; Golob, Edward J
2016-09-01
Developmental stuttering is characterized by fluent speech punctuated by stuttering events, the frequency of which varies among individuals and contexts. Most stuttering events occur at the beginning of an utterance, suggesting neural dynamics associated with stuttering may be evident during speech preparation. This study used EEG to measure cortical activity during speech preparation in men who stutter, and compared the EEG measures to individual differences in stuttering rate as well as to a fluent control group. Each trial contained a cue followed by an acoustic probe at one of two onset times (early or late), and then a picture. There were two conditions: a speech condition where cues induced speech preparation of the picture's name and a control condition that minimized speech preparation. Across conditions stuttering frequency correlated to cue-related EEG beta power and auditory ERP slow waves from early onset acoustic probes. The findings reveal two new cortical markers of stuttering frequency that were present in both conditions, manifest at different times, are elicited by different stimuli (visual cue, auditory probe), and have different EEG responses (beta power, ERP slow wave). The cue-target paradigm evoked brain responses that correlated to pre-experimental stuttering rate. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Jackson, Eric S; Tiede, Mark; Beal, Deryk; Whalen, D H
2016-12-01
This study examined the impact of social-cognitive stress on sentence-level speech variability, determinism, and stability in adults who stutter (AWS) and adults who do not stutter (AWNS). We demonstrated that complementing the spatiotemporal index (STI) with recurrence quantification analysis (RQA) provides a novel approach to both assessing and interpreting speech variability in stuttering. Twenty AWS and 21 AWNS repeated sentences in audience and nonaudience conditions while their lip movements were tracked. Across-sentence variability was assessed via the STI; within-sentence determinism and stability were assessed via RQA. Compared with the AWNS, the AWS produced speech that was more variable across sentences and more deterministic and stable within sentences. Audience presence contributed to greater within-sentence determinism and stability in the AWS. A subset of AWS who were more susceptible to experiencing anxiety exhibited reduced across-sentence variability in the audience condition compared with the nonaudience condition. This study extends the assessment of speech variability in AWS and AWNS into the social-cognitive domain and demonstrates that the characterization of speech within sentences using RQA is complementary to the across-sentence STI measure. AWS seem to adopt a more restrictive, less flexible speaking approach in response to social-cognitive stress, which is presumably a strategy for maintaining observably fluent speech.
Tiede, Mark; Beal, Deryk; Whalen, D. H.
2016-01-01
Purpose This study examined the impact of social–cognitive stress on sentence-level speech variability, determinism, and stability in adults who stutter (AWS) and adults who do not stutter (AWNS). We demonstrated that complementing the spatiotemporal index (STI) with recurrence quantification analysis (RQA) provides a novel approach to both assessing and interpreting speech variability in stuttering. Method Twenty AWS and 21 AWNS repeated sentences in audience and nonaudience conditions while their lip movements were tracked. Across-sentence variability was assessed via the STI; within-sentence determinism and stability were assessed via RQA. Results Compared with the AWNS, the AWS produced speech that was more variable across sentences and more deterministic and stable within sentences. Audience presence contributed to greater within-sentence determinism and stability in the AWS. A subset of AWS who were more susceptible to experiencing anxiety exhibited reduced across-sentence variability in the audience condition compared with the nonaudience condition. Conclusions This study extends the assessment of speech variability in AWS and AWNS into the social–cognitive domain and demonstrates that the characterization of speech within sentences using RQA is complementary to the across-sentence STI measure. AWS seem to adopt a more restrictive, less flexible speaking approach in response to social–cognitive stress, which is presumably a strategy for maintaining observably fluent speech. PMID:27936276
Generating Natural Language Under Pragmatic Constraints.
1987-03-01
central issue, Carter’s loss. concentrating on ,more, pleasant aspects. But what would happen in an extreme case ’.’ what if you, a Carter supporter. are...In [Cohen 78], Cohen studied the effect of the hearer’s knowledge on the selection of appropriate speech act (say, REQUEST vs INFORM OF WANT...utterances is studied in [Clark & Carlson 81], [Clark & Murphy 82]; [Gibbs 79] and [Gibbs 81] discuss the effects of context on the processing of indirect
ERIC Educational Resources Information Center
Garrett, Alan W.
2006-01-01
As Jesse H. Newlon prepared to speak at Teachers College on July 10, 1940, he apparently did not appreciate the impact his words would make. He had not prepared a complete text of his remarks, as was his habit for important speeches, speaking instead from a three-page outline. His ultimate title, "The Teaching Profession and the World Crisis," was…
Li, Feipeng; Trevino, Andrea; Menon, Anjali; Allen, Jont B
2012-10-01
In a previous study on plosives, the 3-Dimensional Deep Search (3DDS) method for the exploration of the necessary and sufficient cues for speech perception was introduced (Li et al., (2010). J. Acoust. Soc. Am. 127(4), 2599-2610). Here, this method is used to isolate the spectral cue regions for perception of the American English fricatives /∫, 3, s, z, f, v, θ, δ in time, frequency, and intensity. The fricatives are analyzed in the context of consonant-vowel utterances, using the vowel /α/. The necessary cues were found to be contained in the frication noise for /∫, 3, s, z, f, v/. 3DDS analysis isolated the cue regions of /s, z/ between 3.6 and 8 [kHz] and /∫, 3/ between 1.4 and 4.2 [kHz]. Some utterances were found to contain acoustic components that were unnecessary for correct perception, but caused listeners to hear non-target consonants when the primary cue region was removed; such acoustic components are labeled "conflicting cue regions." The amplitude modulation of the high-frequency frication region by the fundamental F0 was found to be a sufficient cue for voicing. Overall, the 3DDS method allows one to analyze the effects of natural speech components without initial assumptions about where perceptual cues lie in time-frequency space or which elements of production they correspond to.
Demjén, Zsófia; Semino, Elena
2015-06-01
The book Henry's Demons (2011) recounts the events surrounding Henry Cockburn's diagnosis of schizophrenia from the alternating perspectives of Henry himself and his father Patrick. In this paper, we present a detailed linguistic analysis of Henry's first-person accounts of experiences that could be described as auditory verbal hallucinations. We first provide a typology of Henry's voices, taking into account who or what is presented as speaking, what kinds of utterances they produce and any salient stylistic features of these utterances. We then discuss the linguistically distinctive ways in which Henry represents these voices in his narrative. We focus on the use of Direct Speech as opposed to other forms of speech presentation, the use of the sensory verbs hear and feel and the use of 'non-factive' expressions such as I thought and as if. We show how different linguistic representations may suggest phenomenological differences between the experience of hallucinatory voices and the perception of voices that other people can also hear. We, therefore, propose that linguistic analysis is ideally placed to provide in-depth accounts of the phenomenology of voice hearing and point out the implications of this approach for clinical practice and mental healthcare. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Lang, S; Leistner, S; Sandrieser, P; Kröger, B J
2009-05-01
Early vocal development of German-speaking cochlear implant recipients has rarely been assessed so far. There-fore the purpose of this study was to describe the early vocal development following successful implantation. A case study was designed to assess the temporal progression of early vocal development in a young cochlear implant recipient who was bilaterally implanted at the age of 8;3 months. Data were collected during one year by recording parent-child interactions on a monthly basis. The first recording was made before the onset of the signal-processors, the 12 following recordings were made during the first year of implant use. The child's vocalizations were classified according to the vocalization categories and developmental levels from the Stark Assessment of Early Vocal Development--Revised (SAEVD-R). This assessment tool was translated into German in this study and used with German-speaking children for the first time. It allows a coding of prelinguistic utterances via auditory perceptual analysis. The results show an overall decrease of early vocalizations and an increase of speech-like vowels and consonants. In the first six months no apparent progress took place; The child produced almost exclusively vocalizations from Levels 1-3. In the second half of the year an increase of canonical utterances (Level 4) and advanced forms (Level 5) was observed. However, vocalizations beyond the canonical babbling phase, especially vocants and closants as well as their combinations, continued to be dominant throughout the first year of implant use. The progress of development of the child investigated in this study is comparable to other children implanted at young age who had also been assessed with the SAEVD-R. In comparison to normal-hearing children, the implanted child's development seemed to progress slightly faster. Interrater- and intrarater-reliability using the SAEVD-R were measured for two independent observers and for a first and second coding procedure and revealed to be acceptable to good. The use of SAEVD-R for an implanted German-speaking child allowed the investigation of prelinguistic vocal development before the onset of words. The fact that early vocalizations remain the dominant form throughout the first year of hearing experience emphasizes the importance of documenting and analysing prelinguistic vocal development in order to monitor progression of speech acquisition.
Developmental changes in sensitivity to vocal paralanguage
Friend, Margaret
2017-01-01
Developmental changes in children’s sensitivity to the role of acoustic variation in the speech stream in conveying speaker affect (vocal paralanguage) were examined. Four-, 7- and 10-year-olds heard utterances in three formats: low-pass filtered, reiterant, and normal speech. The availability of lexical and paralinguistic information varied across these three formats in a way that required children to base their judgments of speaker affect on different configurations of cues in each format. Across ages, the best performance was obtained when a rich array of acoustic cues was present and when there was no competing lexical information. Four-year-olds performed at chance when judgments had to be based solely on speech prosody in the filtered format and they were unable to selectively attend to paralanguage when discrepant lexical cues were present in normal speech. Seven-year-olds were significantly more sensitive to the paralinguistic role of speech prosody in filtered speech than were 4-year-olds and there was a trend toward greater attention to paralanguage when lexical and paralinguistic cues were inconsistent in normal speech. An integration of the ability to utilize prosodic cues to speaker affect with attention to paralanguage in cases of lexical/paralinguistic discrepancy was observed for 10-year-olds. The results are discussed in terms of the development of a perceptual bias emerging out of selective attention to language. PMID:28713218
Kotz, Sonja A; Dengler, Reinhard; Wittfoth, Matthias
2015-02-01
Emotional speech comprises of complex multimodal verbal and non-verbal information that allows deducting others' emotional states or thoughts in social interactions. While the neural correlates of verbal and non-verbal aspects and their interaction in emotional speech have been identified, there is very little evidence on how we perceive and resolve incongruity in emotional speech, and whether such incongruity extends to current concepts of task-specific prediction errors as a consequence of unexpected action outcomes ('negative surprise'). Here, we explored this possibility while participants listened to congruent and incongruent angry, happy or neutral utterances and categorized the expressed emotions by their verbal (semantic) content. Results reveal valence-specific incongruity effects: negative verbal content expressed in a happy tone of voice increased activation in the dorso-medial prefrontal cortex (dmPFC) extending its role from conflict moderation to appraisal of valence-specific conflict in emotional speech. Conversely, the caudate head bilaterally responded selectively to positive verbal content expressed in an angry tone of voice broadening previous accounts of the caudate head in linguistic control to moderating valence-specific control in emotional speech. Together, these results suggest that control structures of the human brain (dmPFC and subcompartments of the basal ganglia) impact emotional speech differentially when conflict arises. © The Author (2014). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
How do you say 'hello'? Personality impressions from brief novel voices.
McAleer, Phil; Todorov, Alexander; Belin, Pascal
2014-01-01
On hearing a novel voice, listeners readily form personality impressions of that speaker. Accurate or not, these impressions are known to affect subsequent interactions; yet the underlying psychological and acoustical bases remain poorly understood. Furthermore, hitherto studies have focussed on extended speech as opposed to analysing the instantaneous impressions we obtain from first experience. In this paper, through a mass online rating experiment, 320 participants rated 64 sub-second vocal utterances of the word 'hello' on one of 10 personality traits. We show that: (1) personality judgements of brief utterances from unfamiliar speakers are consistent across listeners; (2) a two-dimensional 'social voice space' with axes mapping Valence (Trust, Likeability) and Dominance, each driven by differing combinations of vocal acoustics, adequately summarises ratings in both male and female voices; and (3) a positive combination of Valence and Dominance results in increased perceived male vocal Attractiveness, whereas perceived female vocal Attractiveness is largely controlled by increasing Valence. Results are discussed in relation to the rapid evaluation of personality and, in turn, the intent of others, as being driven by survival mechanisms via approach or avoidance behaviours. These findings provide empirical bases for predicting personality impressions from acoustical analyses of short utterances and for generating desired personality impressions in artificial voices.
Prediction and constraint in audiovisual speech perception.
Peelle, Jonathan E; Sommers, Mitchell S
2015-07-01
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. Copyright © 2015 Elsevier Ltd. All rights reserved.
The minimal unit of phonological encoding: prosodic or lexical word.
Wheeldon, Linda R; Lahiri, Aditi
2002-09-01
Wheeldon and Lahiri (Journal of Memory and Language 37 (1997) 356) used a prepared speech production task (Sternberg, S., Monsell, S., Knoll, R. L., & Wright, C. E. (1978). The latency and duration of rapid movement sequences: comparisons of speech and typewriting. In G. E. Stelmach (Ed.), Information processing in motor control and learning (pp. 117-152). New York: Academic Press; Sternberg, S., Wright, C. E., Knoll, R. L., & Monsell, S. (1980). Motor programs in rapid speech: additional evidence. In R. A. Cole (Ed.), The perception and production of fluent speech (pp. 507-534). Hillsdale, NJ: Erlbaum) to demonstrate that the latency to articulate a sentence is a function of the number of phonological words it comprises. Latencies for the sentence [Ik zoek het] [water] 'I seek the water' were shorter than latencies for sentences like [Ik zoek] [vers] [water] 'I seek fresh water'. We extend this research by examining the prepared production of utterances containing phonological words that are less than a lexical word in length. Dutch compounds (e.g. ooglid 'eyelid') form a single morphosyntactic word and a phonological word, which in turn includes two phonological words. We compare their prepared production latencies to those syntactic phrases consisting of an adjective and a noun (e.g. oud lid 'old member') which comprise two morphosyntactic and two phonological words, and to morphologically simple words (e.g. orgel 'organ') which comprise one morphosyntactic and one phonological word. Our findings demonstrate that the effect is limited to phrasal level phonological words, suggesting that production models need to make a distinction between lexical and phrasal phonology.
Nonlinear Frequency Compression in Hearing Aids: Impact on Speech and Language Development
Bentler, Ruth; Walker, Elizabeth; McCreery, Ryan; Arenas, Richard M.; Roush, Patricia
2015-01-01
Objectives The research questions of this study were: (1) Are children using nonlinear frequency compression (NLFC) in their hearing aids getting better access to the speech signal than children using conventional processing schemes? The authors hypothesized that children whose hearing aids provided wider input bandwidth would have more access to the speech signal, as measured by an adaptation of the Speech Intelligibility Index, and (2) are speech and language skills different for children who have been fit with the two different technologies; if so, in what areas? The authors hypothesized that if the children were getting increased access to the speech signal as a result of their NLFC hearing aids (question 1), it would be possible to see improved performance in areas of speech production, morphosyntax, and speech perception compared with the group with conventional processing. Design Participants included 66 children with hearing loss recruited as part of a larger multisite National Institutes of Health–funded study, Outcomes for Children with Hearing Loss, designed to explore the developmental outcomes of children with mild to severe hearing loss. For the larger study, data on communication, academic and psychosocial skills were gathered in an accelerated longitudinal design, with entry into the study between 6 months and 7 years of age. Subjects in this report consisted of 3-, 4-, and 5-year-old children recruited at the North Carolina test site. All had at least at least 6 months of current hearing aid usage with their NLFC or conventional amplification. Demographic characteristics were compared at the three age levels as well as audibility and speech/language outcomes; speech-perception scores were compared for the 5-year-old groups. Results Results indicate that the audibility provided did not differ between the technology options. As a result, there was no difference between groups on speech or language outcome measures at 4 or 5 years of age, and no impact on speech perception (measured at 5 years of age). The difference in Comprehensive Assessment of Spoken Language and mean length of utterance scores for the 3-year-old group favoring the group with conventional amplification may be a consequence of confounding factors such as increased incidence of prematurity in the group using NLFC. Conclusions Children fit with NLFC had similar audibility, as measured by a modified Speech Intelligibility Index, compared with a matched group of children using conventional technology. In turn, there were no differences in their speech and language abilities. PMID:24892229
Nonlinear frequency compression in hearing aids: impact on speech and language development.
Bentler, Ruth; Walker, Elizabeth; McCreery, Ryan; Arenas, Richard M; Roush, Patricia
2014-01-01
The research questions of this study were: (1) Are children using nonlinear frequency compression (NLFC) in their hearing aids getting better access to the speech signal than children using conventional processing schemes? The authors hypothesized that children whose hearing aids provided wider input bandwidth would have more access to the speech signal, as measured by an adaptation of the Speech Intelligibility Index, and (2) are speech and language skills different for children who have been fit with the two different technologies; if so, in what areas? The authors hypothesized that if the children were getting increased access to the speech signal as a result of their NLFC hearing aids (question 1), it would be possible to see improved performance in areas of speech production, morphosyntax, and speech perception compared with the group with conventional processing. Participants included 66 children with hearing loss recruited as part of a larger multisite National Institutes of Health-funded study, Outcomes for Children with Hearing Loss, designed to explore the developmental outcomes of children with mild to severe hearing loss. For the larger study, data on communication, academic and psychosocial skills were gathered in an accelerated longitudinal design, with entry into the study between 6 months and 7 years of age. Subjects in this report consisted of 3-, 4-, and 5-year-old children recruited at the North Carolina test site. All had at least at least 6 months of current hearing aid usage with their NLFC or conventional amplification. Demographic characteristics were compared at the three age levels as well as audibility and speech/language outcomes; speech-perception scores were compared for the 5-year-old groups. Results indicate that the audibility provided did not differ between the technology options. As a result, there was no difference between groups on speech or language outcome measures at 4 or 5 years of age, and no impact on speech perception (measured at 5 years of age). The difference in Comprehensive Assessment of Spoken Language and mean length of utterance scores for the 3-year-old group favoring the group with conventional amplification may be a consequence of confounding factors such as increased incidence of prematurity in the group using NLFC. Children fit with NLFC had similar audibility, as measured by a modified Speech Intelligibility Index, compared with a matched group of children using conventional technology. In turn, there were no differences in their speech and language abilities.
Condouris, Karen; Meyer, Echo; Tager-Flusberg, Helen
2005-01-01
This study investigated the relationship between scores on standardized tests (Clinical Evaluation of Language Fundamentals [CELF], Peabody Picture Vocabulary Test–Third Edition [PPVT-III], and Expressive Vocabulary Test) and measures of spontaneous speech (mean length of utterance [MLU], Index of Productive Syntax, and number of different word roots [NDWR]) derived from natural language samples obtained from 44 children with autism between the ages of 4 and 14 years old. The children with autism were impaired across both groups of measures. The two groups of measures were significantly correlated, and specific relationships were found between lexical–semantic measures (NDWR, vocabulary tests, and the CELF lexical–semantic subtests) and grammatical measures (MLU, and CELF grammar subtests), suggesting that both standardized and spontaneous speech measures tap the same underlying linguistic abilities in children with autism. These findings have important implications for clinicians and researchers who depend on these types of language measures for diagnostic purposes, assessment, and investigations of language impairments in autism. PMID:12971823
Relationship between perceived politeness and spectral characteristics of voice
NASA Astrophysics Data System (ADS)
Ito, Mika
2005-04-01
This study investigates the role of voice quality in perceiving politeness under conditions of varying relative social status among Japanese male speakers. The work focuses on four important methodological issues: experimental control of sociolinguistic aspects, eliciting natural spontaneous speech, obtaining recording quality suitable for voice quality analysis, and assessment of glottal characteristics through the use of non-invasive direct measurements of the speech spectrum. To obtain natural, unscripted utterances, the speech data were collected with a Map Task. This methodology allowed us to study the effect of manipulating relative social status among participants in the same community. We then computed the relative amplitudes of harmonics and formant peaks in spectra obtained from the Map Task recordings. Finally, an experiment was conducted to observe the alignment between acoustic measures and the perceived politeness of the voice samples. The results suggest that listeners' perceptions of politeness are determined by spectral characteristics of speakers, in particular, spectral tilts obtained by computing the difference in amplitude between the first harmonic and the third formant.
Gowda, Dhananjaya; Airaksinen, Manu; Alku, Paavo
2017-09-01
Recently, a quasi-closed phase (QCP) analysis of speech signals for accurate glottal inverse filtering was proposed. However, the QCP analysis which belongs to the family of temporally weighted linear prediction (WLP) methods uses the conventional forward type of sample prediction. This may not be the best choice especially in computing WLP models with a hard-limiting weighting function. A sample selective minimization of the prediction error in WLP reduces the effective number of samples available within a given window frame. To counter this problem, a modified quasi-closed phase forward-backward (QCP-FB) analysis is proposed, wherein each sample is predicted based on its past as well as future samples thereby utilizing the available number of samples more effectively. Formant detection and estimation experiments on synthetic vowels generated using a physical modeling approach as well as natural speech utterances show that the proposed QCP-FB method yields statistically significant improvements over the conventional linear prediction and QCP methods.
Patterns of Adult-Child Linguistic Interaction in Integrated Day Care Groups.
Girolametto, Luigi; Hoaken, Lisa; Weitzman, Elaine; Lieshout, Riet van
2000-04-01
This study investigated the language input of eight childcare providers to children with developmental disabilities, including language delay, who were integrated into community day care centers. Structural and discourse features of the adults' language input was compared across two groups (integrated, typical) and two naturalistic day care contexts (book reading, play dough activity). The eight children with developmental disabilities and language delay were between 33-50 months of age; 32 normally developing peers ranged in age from 32-53 months of age. Adult-child interactions were transcribed and coded to yield estimates of structural indices (number of utterances, rate, mean length of utterances, ratio of different words to total words used (TTR) and discourse features (directive, interactive, language-modelling) of their language input. The language input addressed to the children with developmental disabilities was directive and not finely tuned to their expressive language levels. In turn, these children interacted infrequently with the adult or with the other children. Contextual comparisons indicated that the play dough activity promoted adult-child interaction that was less directive and more interaction-promoting than book reading, and that children interacted more frequently in the play-dough activity. Implications for speech-language pathologists include the need for collaborative consultation in integrated settings, modification of adult-child play contexts to promote interaction, and training childcare providers to use language input that promotes communication development.
Relationship Between Speech Intelligibility and Speech Comprehension in Babble Noise.
Fontan, Lionel; Tardieu, Julien; Gaillard, Pascal; Woisard, Virginie; Ruiz, Robert
2015-06-01
The authors investigated the relationship between the intelligibility and comprehension of speech presented in babble noise. Forty participants listened to French imperative sentences (commands for moving objects) in a multitalker babble background for which intensity was experimentally controlled. Participants were instructed to transcribe what they heard and obey the commands in an interactive environment set up for this purpose. The former test provided intelligibility scores and the latter provided comprehension scores. Collected data revealed a globally weak correlation between intelligibility and comprehension scores (r = .35, p < .001). The discrepancy tended to grow as noise level increased. An analysis of standard deviations showed that variability in comprehension scores increased linearly with noise level, whereas higher variability in intelligibility scores was found for moderate noise level conditions. These results support the hypothesis that intelligibility scores are poor predictors of listeners' comprehension in real communication situations. Intelligibility and comprehension scores appear to provide different insights, the first measure being centered on speech signal transfer and the second on communicative performance. Both theoretical and practical implications for the use of speech intelligibility tests as indicators of speakers' performances are discussed.
Centanni, Tracy M.; Chen, Fuyi; Booker, Anne M.; Engineer, Crystal T.; Sloan, Andrew M.; Rennaker, Robert L.; LoTurco, Joseph J.; Kilgard, Michael P.
2014-01-01
In utero RNAi of the dyslexia-associated gene Kiaa0319 in rats (KIA-) degrades cortical responses to speech sounds and increases trial-by-trial variability in onset latency. We tested the hypothesis that KIA- rats would be impaired at speech sound discrimination. KIA- rats needed twice as much training in quiet conditions to perform at control levels and remained impaired at several speech tasks. Focused training using truncated speech sounds was able to normalize speech discrimination in quiet and background noise conditions. Training also normalized trial-by-trial neural variability and temporal phase locking. Cortical activity from speech trained KIA- rats was sufficient to accurately discriminate between similar consonant sounds. These results provide the first direct evidence that assumed reduced expression of the dyslexia-associated gene KIAA0319 can cause phoneme processing impairments similar to those seen in dyslexia and that intensive behavioral therapy can eliminate these impairments. PMID:24871331
The Temporal Prediction of Stress in Speech and Its Relation to Musical Beat Perception.
Beier, Eleonora J; Ferreira, Fernanda
2018-01-01
While rhythmic expectancies are thought to be at the base of beat perception in music, the extent to which stress patterns in speech are similarly represented and predicted during on-line language comprehension is debated. The temporal prediction of stress may be advantageous to speech processing, as stress patterns aid segmentation and mark new information in utterances. However, while linguistic stress patterns may be organized into hierarchical metrical structures similarly to musical meter, they do not typically present the same degree of periodicity. We review the theoretical background for the idea that stress patterns are predicted and address the following questions: First, what is the evidence that listeners can predict the temporal location of stress based on preceding rhythm? If they can, is it thanks to neural entrainment mechanisms similar to those utilized for musical beat perception? And lastly, what linguistic factors other than rhythm may account for the prediction of stress in natural speech? We conclude that while expectancies based on the periodic presentation of stresses are at play in some of the current literature, other processes are likely to affect the prediction of stress in more naturalistic, less isochronous speech. Specifically, aspects of prosody other than amplitude changes (e.g., intonation) as well as lexical, syntactic and information structural constraints on the realization of stress may all contribute to the probabilistic expectation of stress in speech.
[Verbal and gestural communication in interpersonal interaction with Alzheimer's disease patients].
Schiaratura, Loris Tamara; Di Pastena, Angela; Askevis-Leherpeux, Françoise; Clément, Sylvain
2015-03-01
Communication can be defined as a verbal and non verbal exchange of thoughts and emotions. While verbal communication deficit in Alzheimer's disease is well documented, very little is known about gestural communication, especially in interpersonal situations. This study examines the production of gestures and its relations with verbal aspects of communication. Three patients suffering from moderately severe Alzheimer's disease were compared to three healthy adults. Each one were given a series of pictures and asked to explain which one she preferred and why. The interpersonal interaction was video recorded. Analyses concerned verbal production (quantity and quality) and gestures. Gestures were either non representational (i.e., gestures of small amplitude punctuating speech or accentuating some parts of utterance) or representational (i.e., referring to the object of the speech). Representational gestures were coded as iconic (depicting of concrete aspects), metaphoric (depicting of abstract meaning) or deictic (pointing toward an object). In comparison with healthy participants, patients revealed a decrease in quantity and quality of speech. Nevertheless, their production of gestures was always present. This pattern is in line with the conception that gestures and speech depend on different communicational systems and look inconsistent with the assumption of a parallel dissolution of gesture and speech. Moreover, analyzing the articulation between verbal and gestural dimensions suggests that representational gestures may compensate for speech deficits. It underlines the importance for the role of gestures in maintaining interpersonal communication.
Peeters, David; Chu, Mingyuan; Holler, Judith; Hagoort, Peter; Özyürek, Aslı
2015-12-01
In everyday human communication, we often express our communicative intentions by manually pointing out referents in the material world around us to an addressee, often in tight synchronization with referential speech. This study investigated whether and how the kinematic form of index finger pointing gestures is shaped by the gesturer's communicative intentions and how this is modulated by the presence of concurrently produced speech. Furthermore, we explored the neural mechanisms underpinning the planning of communicative pointing gestures and speech. Two experiments were carried out in which participants pointed at referents for an addressee while the informativeness of their gestures and speech was varied. Kinematic and electrophysiological data were recorded online. It was found that participants prolonged the duration of the stroke and poststroke hold phase of their gesture to be more communicative, in particular when the gesture was carrying the main informational burden in their multimodal utterance. Frontal and P300 effects in the ERPs suggested the importance of intentional and modality-independent attentional mechanisms during the planning phase of informative pointing gestures. These findings contribute to a better understanding of the complex interplay between action, attention, intention, and language in the production of pointing gestures, a communicative act core to human interaction.
Comparison of Classification Methods for Detecting Emotion from Mandarin Speech
NASA Astrophysics Data System (ADS)
Pao, Tsang-Long; Chen, Yu-Te; Yeh, Jun-Heng
It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.
Brain correlates of stuttering and syllable production. A PET performance-correlation analysis.
Fox, P T; Ingham, R J; Ingham, J C; Zamarripa, F; Xiong, J H; Lancaster, J L
2000-10-01
To distinguish the neural systems of normal speech from those of stuttering, PET images of brain blood flow were probed (correlated voxel-wise) with per-trial speech-behaviour scores obtained during PET imaging. Two cohorts were studied: 10 right-handed men who stuttered and 10 right-handed, age- and sex-matched non-stuttering controls. Ninety PET blood flow images were obtained in each cohort (nine per subject as three trials of each of three conditions) from which r-value statistical parametric images (SPI¿r¿) were computed. Brain correlates of stutter rate and syllable rate showed striking differences in both laterality and sign (i.e. positive or negative correlations). Stutter-rate correlates, both positive and negative, were strongly lateralized to the right cerebral and left cerebellar hemispheres. Syllable correlates in both cohorts were bilateral, with a bias towards the left cerebral and right cerebellar hemispheres, in keeping with the left-cerebral dominance for language and motor skills typical of right-handed subjects. For both stutters and syllables, the brain regions that were correlated positively were those of speech production: the mouth representation in the primary motor cortex; the supplementary motor area; the inferior lateral premotor cortex (Broca's area); the anterior insula; and the cerebellum. The principal difference between syllable-rate and stutter-rate positive correlates was hemispheric laterality. A notable exception to this rule was that cerebellar positive correlates for syllable rate were far more extensive in the stuttering cohort than in the control cohort, which suggests a specific role for the cerebellum in enabling fluent utterances in persons who stutter. Stutters were negatively correlated with right-cerebral regions (superior and middle temporal gyrus) associated with auditory perception and processing, regions which were positively correlated with syllables in both the stuttering and control cohorts. These findings support long-held theories that the brain correlates of stuttering are the speech-motor regions of the non-dominant (right) cerebral hemisphere, and extend this theory to include the non-dominant (left) cerebellar hemisphere. The present findings also indicate a specific role of the cerebellum in the fluent utterances of persons who stutter. Support is also offered for theories that implicate auditory processing problems in stuttering.
Recognizing speech in a novel accent: the motor theory of speech perception reframed.
Moulin-Frier, Clément; Arbib, Michael A
2013-08-01
The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory.
Contextual influences on children's use of vocal affect cues during referential interpretation.
Berman, Jared M J; Graham, Susan A; Chambers, Craig G
2013-01-01
In three experiments, we investigated 5-year-olds' sensitivity to speaker vocal affect during referential interpretation in cases where the indeterminacy is or is not resolved by speech information. In Experiment 1, analyses of eye gaze patterns and pointing behaviours indicated that 5-year-olds used vocal affect cues at the point where an ambiguous description was encountered. In Experiments 2 and 3, we used unambiguous situations to investigate how the referential context influences the ability to use affect cues earlier in the utterance. Here, we found a differential use of speaker vocal affect whereby 5-year-olds' referential hypotheses were influenced by negative vocal affect cues in advance of the noun, but not by positive affect cues. Together, our findings reveal how 5-year-olds use a speaker's vocal affect to identify potential referents in different contextual situations and also suggest that children may be more attuned to negative vocal affect than positive vocal affect, particularly early in an utterance.
[Instrumental, directive, and affective communication in hospital leaflets].
Vasconcellos-Silva, Paulo Roberto; Uribe Rivera, Francisco Javier; Castiel, Luis David
2003-01-01
This study focuses on the typical semantic systems extracted from hospital staff communicative resources which attempt to validate information as an "object" to be transferred to patients. We describe the models of textual communication in 58 patient information leaflets from five hospital units in Brazil, gathered from 1996 to 2002. Three categories were identified, based on the theory of speech acts (Austin, Searle, and Habermas): 1) cognitive-instrumental utterances: descriptions by means of technical terms validated by self-referred, incomplete, or inaccessible argumentation, with an implicit educational function; 2) technical-directive utterances: self-referred (to the context of the source domains), with a shifting of everyday acts to a technical terrain with a disciplinary function and impersonal features; and 3) expressive modulations: need for inter-subjective connections to strengthen bonds of trust and a tendency to use childish arguments. We conclude that the three categories displayed: fragmentary sources; assumption of univocal messages and invariable use of information (idealized motivations and interests, apart from individualized perspectives); and assumption of universal interests as generators of knowledge.
Selected Topics from LVCSR Research for Asian Languages at Tokyo Tech
NASA Astrophysics Data System (ADS)
Furui, Sadaoki
This paper presents our recent work in regard to building Large Vocabulary Continuous Speech Recognition (LVCSR) systems for the Thai, Indonesian, and Chinese languages. For Thai, since there is no word boundary in the written form, we have proposed a new method for automatically creating word-like units from a text corpus, and applied topic and speaking style adaptation to the language model to recognize spoken-style utterances. For Indonesian, we have applied proper noun-specific adaptation to acoustic modeling, and rule-based English-to-Indonesian phoneme mapping to solve the problem of large variation in proper noun and English word pronunciation in a spoken-query information retrieval system. In spoken Chinese, long organization names are frequently abbreviated, and abbreviated utterances cannot be recognized if the abbreviations are not included in the dictionary. We have proposed a new method for automatically generating Chinese abbreviations, and by expanding the vocabulary using the generated abbreviations, we have significantly improved the performance of spoken query-based search.
Hearing and seeing meaning in speech and gesture: insights from brain and behaviour
Özyürek, Aslı
2014-01-01
As we speak, we use not only the arbitrary form–meaning mappings of the speech channel but also motivated form–meaning correspondences, i.e. iconic gestures that accompany speech (e.g. inverted V-shaped hand wiggling across gesture space to demonstrate walking). This article reviews what we know about processing of semantic information from speech and iconic gestures in spoken languages during comprehension of such composite utterances. Several studies have shown that comprehension of iconic gestures involves brain activations known to be involved in semantic processing of speech: i.e. modulation of the electrophysiological recording component N400, which is sensitive to the ease of semantic integration of a word to previous context, and recruitment of the left-lateralized frontal–posterior temporal network (left inferior frontal gyrus (IFG), medial temporal gyrus (MTG) and superior temporal gyrus/sulcus (STG/S)). Furthermore, we integrate the information coming from both channels recruiting brain areas such as left IFG, posterior superior temporal sulcus (STS)/MTG and even motor cortex. Finally, this integration is flexible: the temporal synchrony between the iconic gesture and the speech segment, as well as the perceived communicative intent of the speaker, modulate the integration process. Whether these findings are special to gestures or are shared with actions or other visual accompaniments to speech (e.g. lips) or other visual symbols such as pictures are discussed, as well as the implications for a multimodal view of language. PMID:25092664
Hearing and seeing meaning in speech and gesture: insights from brain and behaviour.
Özyürek, Aslı
2014-09-19
As we speak, we use not only the arbitrary form-meaning mappings of the speech channel but also motivated form-meaning correspondences, i.e. iconic gestures that accompany speech (e.g. inverted V-shaped hand wiggling across gesture space to demonstrate walking). This article reviews what we know about processing of semantic information from speech and iconic gestures in spoken languages during comprehension of such composite utterances. Several studies have shown that comprehension of iconic gestures involves brain activations known to be involved in semantic processing of speech: i.e. modulation of the electrophysiological recording component N400, which is sensitive to the ease of semantic integration of a word to previous context, and recruitment of the left-lateralized frontal-posterior temporal network (left inferior frontal gyrus (IFG), medial temporal gyrus (MTG) and superior temporal gyrus/sulcus (STG/S)). Furthermore, we integrate the information coming from both channels recruiting brain areas such as left IFG, posterior superior temporal sulcus (STS)/MTG and even motor cortex. Finally, this integration is flexible: the temporal synchrony between the iconic gesture and the speech segment, as well as the perceived communicative intent of the speaker, modulate the integration process. Whether these findings are special to gestures or are shared with actions or other visual accompaniments to speech (e.g. lips) or other visual symbols such as pictures are discussed, as well as the implications for a multimodal view of language. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Trudeau, Natacha; Sutton, Ann; Dagenais, Emmanuelle; de Broeck, Sophie; Morford, Jill
2007-10-01
This study investigated the impact of syntactic complexity and task demands on construction of utterances using picture communication symbols by participants from 3 age groups with no communication disorders. Participants were 30 children (7;0 [years;months] to 8;11), 30 teenagers (12;0 to 13;11), and 30 adults (18 years and above). All participants constructed graphic symbol utterances to describe photographs presented with spoken French stimuli. Stimuli included simple and complex (object relative and subject relative) utterances describing the photographs, which were presented either 1 at a time (neutral condition) or in an array of 4 (contrast condition). Simple utterances lead to more uniform response patterns than complex utterances. Among complex utterances, subject relative sentences appeared more difficult to convey. Increasing the need for message clarity (i.e., contrast condition) elicited changes in the production of graphic symbol sequences for complex propositions. The effects of syntactic complexity and task demands were more pronounced for children. Graphic symbol utterance construction appears to involve more than simply transferring spoken language skills. One possible explanation is that this type of task requires higher levels of metalinguistic ability. Clinical implications and directions for further research are discussed.
Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆
Cao, Houwei; Verma, Ragini; Nenkova, Ani
2014-01-01
We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion. PMID:25422534
Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆
Cao, Houwei; Verma, Ragini; Nenkova, Ani
2015-01-01
We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.
Frontal lobe epileptic seizures are accompanied by elevated pitch during verbal communication.
Speck, Iva; Echternach, Matthias; Sammler, Daniela; Schulze-Bonhage, Andreas
2018-03-01
The objective of our study was to assess alterations in speech as a possible localizing sign in frontal lobe epilepsy. Ictal speech was analyzed in 18 patients with frontal lobe epilepsy (FLE) during seizures and in the interictal period. Matched identical words were analyzed regarding alterations in fundamental frequency (ƒo) as an approximation of pitch. In patients with FLE, ƒo of ictal utterances was significantly higher than ƒo in interictal recordings (p = 0.016). Ictal ƒo increases occurred in both FLE of right and left seizure origin. In contrast, a matched temporal lobe epilepsy (TLE) group showed less pronounced increases in ƒo, and only in patients with right-sided seizure foci. This study for the first time shows significant voice alterations in ictal speech in a cohort of patients with FLE. This may contribute to the localization of the epileptic focus. Increases in ƒo were interestingly found in frontal lobe seizures with origin in either hemisphere, suggesting a bilateral involvement to the planning of speech production, in contrast to a more right-sided lateralization of pitch perception in prosodic processing. Wiley Periodicals, Inc. © 2018 International League Against Epilepsy.
NASA Technical Reports Server (NTRS)
Chan, Jeffrey W.; Simpson, Carol A.
1990-01-01
Active Noise Reduction (ANR) is a new technology which can reduce the level of aircraft cockpit noise that reaches the pilot's ear while simultaneously improving the signal to noise ratio for voice communications and other information bearing sound signals in the cockpit. A miniature, ear-cup mounted ANR system was tested to determine whether speech intelligibility is better for helicopter pilots using ANR compared to a control condition of ANR turned off. Two signal to noise ratios (S/N), representative of actual cockpit conditions, were used for the ratio of the speech to cockpit noise sound pressure levels. Speech intelligibility was significantly better with ANR compared to no ANR for both S/N conditions. Variability of speech intelligibility among pilots was also significantly less with ANR. When the stock helmet was used with ANR turned off, the average PB Word speech intelligibility score was below the Normally Acceptable level. In comparison, it was above that level with ANR on in both S/N levels.
ERIC Educational Resources Information Center
Jackson, Eric S.; Tiede, Mark; Beal, Deryk; Whalen, D. H.
2016-01-01
Purpose: This study examined the impact of social-cognitive stress on sentence-level speech variability, determinism, and stability in adults who stutter (AWS) and adults who do not stutter (AWNS). We demonstrated that complementing the spatiotemporal index (STI) with recurrence quantification analysis (RQA) provides a novel approach to both…
Zhang, Caicai; Peng, Gang; Wang, William S-Y
2012-08-01
Context is important for recovering language information from talker-induced variability in acoustic signals. In tone perception, previous studies reported similar effects of speech and nonspeech contexts in Mandarin, supporting a general perceptual mechanism underlying tone normalization. However, no supportive evidence was obtained in Cantonese, also a tone language. Moreover, no study has compared speech and nonspeech contexts in the multi-talker condition, which is essential for exploring the normalization mechanism of inter-talker variability in speaking F0. The other question is whether a talker's full F0 range and mean F0 equally facilitate normalization. To answer these questions, this study examines the effects of four context conditions (speech/nonspeech × F0 contour/mean F0) in the multi-talker condition in Cantonese. Results show that raising and lowering the F0 of speech contexts change the perception of identical stimuli from mid level tone to low and high level tone, whereas nonspeech contexts only mildly increase the identification preference. It supports the speech-specific mechanism of tone normalization. Moreover, speech context with flattened F0 trajectory, which neutralizes cues of a talker's full F0 range, fails to facilitate normalization in some conditions, implying that a talker's mean F0 is less efficient for minimizing talker-induced lexical ambiguity in tone perception.
Through a glass darkly: some insights on change talk via magnetoencephalography.
Houck, Jon M; Moyers, Theresa B; Tesche, Claudia D
2013-06-01
Motivational interviewing (MI) is a directive, client-centered therapeutic method employed in the treatment of substance abuse, with strong evidence of effectiveness. To date, the sole mechanism of action in MI with any consistent empirical support is "change talk" (CT), which is generally defined as client within-session speech in support of a behavior change. "Sustain talk" (ST) incorporates speech in support of the status quo. MI maintains that during treatment, clients essentially talk themselves into change. Multiple studies have now supported this theory, linking within-session speech to substance use outcomes. Although a causal chain has been established linking therapist behavior, client CT, and substance use outcome, the neural substrate of CT has been largely uncharted. We addressed this gap by measuring neural responses to clients' own CT using magnetoencephalography (MEG), a noninvasive neuroimaging technique with excellent spatial and temporal resolution. Following a recorded MI session, MEG was used to measure brain activity while participants heard multiple repetitions of their CT and ST utterances from that session, intermingled and presented in a random order. Results suggest that CT processing occurs in a right-hemisphere network that includes the inferior frontal gyrus, insula, and superior temporal cortex. These results support a representation of CT at the neural level, consistent with the role of these structures in self-perception. This suggests that during treatment sessions, clinicians who are able to evoke this special kind of language are tapping into neural circuitry that may be essential to behavior change. 2013 APA, all rights reserved
Utterance Detection by Intraoral Acceleration Sensor
NASA Astrophysics Data System (ADS)
Saiki, Tsunemasa; Takizawa, Yukako; Hashizume, Tsutomu; Higuchi, Kohei; Fujita, Takayuki; Maenaka, Kazusuke
In order to establish monitoring systems for home health in elderly people including the prevention of mental illness, we investigated the acceleration of teeth in utterance on the assumption that an acceleration sensor can be implanted into an artificial denture in the near future. In the experiment, an acceleration sensor was fixed in front of the central incisors on the lower jaw by using a denture adhesive, and female and male subjects spoke five Japanese vowels. We then measured the teeth accelerations in three (front-to-back, right-to-left and top-to-bottom) axes and conducted frequency analyses. The result showed that high power spectral densities of the teeth accelerations were observed at a low frequency range of 2-10Hz (both the female and the male) and at a high frequency range of 200-300Hz (the female) or 100-150 Hz (the male). The low and high frequency components indicate movements of the lower jaw and voice sounds by bone conduction, respectively. Especially in the top-to-bottom axis of the central incisor, the frequency component appeared to be significant. Therefore, we found that utterance can be efficiently detected using the acceleration in this axis. We also found that three conditions of normal speech, lip synchronizing and humming can be recognized by using frequency analysis of the acceleration in the top-to-bottom axis of the central incisor.
Auditory perception in the child.
Nicolay-Pirmolin, M
2003-01-01
The development of auditory perception in the infant starts in utero and continues up to the age of 9-10 years. We shall examine the various stages, the various acoustic parameters and the segmental level. Three stages are important: from 7 months onwards: first perceptual reorganization; between 7 and 12 months: second perceptual reorganization; from 10 to 24 months: segmentation of the spoken word. We will note the evolution between 2 and 6 years and between 6 and 9 years: 9 years being the critical age--switching from global treatment to analytic treatment of utterances. We will then examine musical perception and we note that at the prelinguistic level it is the same perceptive units that handle verbal sequences and musical sequences. The stages of musical perception are parallel to those for speech. Bigand posed the question: "should we see in these hierarchies, and in their importance to perception, the manifestation of an overall cognitive constraint restricting the handling of long sequences of acoustic events (including language) and why not even for all processes dealing with symbolic information".
Evolution: Language Use and the Evolution of Languages
NASA Astrophysics Data System (ADS)
Croft, William
Language change can be understood as an evolutionary process. Language change occurs at two different timescales, corresponding to the two steps of the evolutionary process. The first timescale is very short, namely, the production of an utterance: this is where linguistic structures are replicated and language variation is generated. The second timescale is (or can be) very long, namely, the propagation of linguistic variants in the speech community: this is where certain variants are selected over others. At both timescales, the evolutionary process is driven by social interaction and the role language plays in it. An understanding of social interaction at the micro-level—face-to-face interactions—and at the macro-level—the structure of speech communities—gives us the basis for understanding the generation and propagation of language structures, and understanding the nature of language itself.
Dysprosody nonassociated with neurological diseases--a case report.
Pinto, José Antonio; Corso, Renato José; Guilherme, Ana Cláudia Rocha; Pinho, Sílvia Rebelo; Nóbrega, Monica de Oliveira
2004-03-01
Dysprosody also known as pseudo-foreign dialect, is the rarest neurological speech disorder. It is characterized by alterations in intensity, in the timing of utterance segments, and in rhythm, cadency, and intonation of words. The terms refers to changes as to duration, fundamental frequency, and intensity of tonic and atonic syllables of the sentences spoken, which deprive an individual's particular speech of its characteristics. The cause of this disease is usually associated with neurological pathologies such as brain vascular accidents, cranioencephalic traumatisms, and brain tumors. The authors report a case of dysprosody attended to at the Núcleo de Otorrinolaringologia e Cirurgia de Cabeça e Pescoço de São Paulo (NOSP). It is about a female patient with bilateral III degree Reinke's edema and normal neurological examinations that started presenting characteristics of the German dialect following a larynx microsurgery.
ERIC Educational Resources Information Center
Kuriscak, Lisa
2015-01-01
This study focuses on variation within a group of learners of Spanish (N = 253) who produced requests and complaints via a written discourse completion task. It examines the effects of learner and situational variables on production--the effect of proficiency and addressee-gender on speech-act choice and the effect of perception of imposition on…
Predicting phonetic transcription agreement: Insights from research in infant vocalizations
RAMSDELL, HEATHER L.; OLLER, D. KIMBROUGH; ETHINGTON, CORINNA A.
2010-01-01
The purpose of this study is to provide new perspectives on correlates of phonetic transcription agreement. Our research focuses on phonetic transcription and coding of infant vocalizations. The findings are presumed to be broadly applicable to other difficult cases of transcription, such as found in severe disorders of speech, which similarly result in low reliability for a variety of reasons. We evaluated the predictiveness of two factors not previously documented in the literature as influencing transcription agreement: canonicity and coder confidence. Transcribers coded samples of infant vocalizations, judging both canonicity and confidence. Correlation results showed that canonicity and confidence were strongly related to agreement levels, and regression results showed that canonicity and confidence both contributed significantly to explanation of variance. Specifically, the results suggest that canonicity plays a major role in transcription agreement when utterances involve supraglottal articulation, with coder confidence offering additional power in predicting transcription agreement. PMID:17882695
ERIC Educational Resources Information Center
Teten, Amy F.; DeVeney, Shari L.; Friehe, Mary J.
2016-01-01
Purpose: The purpose of this survey was to determine the self-perceived competence levels in voice disorders of practicing school-based speech-language pathologists (SLPs) and identify correlated variables. Method: Participants were 153 master's level, school-based SLPs with a Nebraska teaching certificate and/or licensure who completed a survey,…
Schizophrenia and the structure of language: the linguist's view.
Covington, Michael A; He, Congzhou; Brown, Cati; Naçi, Lorina; McClain, Jonathan T; Fjordbak, Bess Sirmon; Semple, James; Brown, John
2005-09-01
Patients with schizophrenia often display unusual language impairments. This is a wide ranging critical review of the literature on language in schizophrenia since the 19th century. We survey schizophrenic language level by level, from phonetics through phonology, morphology, syntax, semantics, and pragmatics. There are at least two kinds of impairment (perhaps not fully distinct): thought disorder, or failure to maintain a discourse plan, and schizophasia, comprising various dysphasia-like impairments such as clanging, neologism, and unintelligible utterances. Thought disorder appears to be primarily a disruption of executive function and pragmatics, perhaps with impairment of the syntax-semantics interface; schizophasia involves disruption at other levels. Phonetics is also often abnormal (manifesting as flat intonation or unusual voice quality), but phonological structure, morphology, and syntax are normal or nearly so (some syntactic impairments have been demonstrated). Access to the lexicon is clearly impaired, manifesting as stilted speech, word approximation, and neologism. Clanging (glossomania) is straightforwardly explainable as distraction by self-monitoring. Recent research has begun to relate schizophrenia, which is partly genetic, to the genetic endowment that makes human language possible.
Neural representations and mechanisms for the performance of simple speech sequences
Bohland, Jason W.; Bullock, Daniel; Guenther, Frank H.
2010-01-01
Speakers plan the phonological content of their utterances prior to their release as speech motor acts. Using a finite alphabet of learned phonemes and a relatively small number of syllable structures, speakers are able to rapidly plan and produce arbitrary syllable sequences that fall within the rules of their language. The class of computational models of sequence planning and performance termed competitive queuing (CQ) models have followed Lashley (1951) in assuming that inherently parallel neural representations underlie serial action, and this idea is increasingly supported by experimental evidence. In this paper we develop a neural model that extends the existing DIVA model of speech production in two complementary ways. The new model includes paired structure and content subsystems (cf. MacNeilage, 1998) that provide parallel representations of a forthcoming speech plan, as well as mechanisms for interfacing these phonological planning representations with learned sensorimotor programs to enable stepping through multi-syllabic speech plans. On the basis of previous reports, the model’s components are hypothesized to be localized to specific cortical and subcortical structures, including the left inferior frontal sulcus, the medial premotor cortex, the basal ganglia and thalamus. The new model, called GODIVA (Gradient Order DIVA), thus fills a void in current speech research by providing formal mechanistic hypotheses about both phonological and phonetic processes that are grounded by neuroanatomy and physiology. This framework also generates predictions that can be tested in future neuroimaging and clinical case studies. PMID:19583476
Facial expressions and the evolution of the speech rhythm.
Ghazanfar, Asif A; Takahashi, Daniel Y
2014-06-01
In primates, different vocalizations are produced, at least in part, by making different facial expressions. Not surprisingly, humans, apes, and monkeys all recognize the correspondence between vocalizations and the facial postures associated with them. However, one major dissimilarity between monkey vocalizations and human speech is that, in the latter, the acoustic output and associated movements of the mouth are both rhythmic (in the 3- to 8-Hz range) and tightly correlated, whereas monkey vocalizations have a similar acoustic rhythmicity but lack the concommitant rhythmic facial motion. This raises the question of how we evolved from a presumptive ancestral acoustic-only vocal rhythm to the one that is audiovisual with improved perceptual sensitivity. According to one hypothesis, this bisensory speech rhythm evolved through the rhythmic facial expressions of ancestral primates. If this hypothesis has any validity, we expect that the extant nonhuman primates produce at least some facial expressions with a speech-like rhythm in the 3- to 8-Hz frequency range. Lip smacking, an affiliative signal observed in many genera of primates, satisfies this criterion. We review a series of studies using developmental, x-ray cineradiographic, EMG, and perceptual approaches with macaque monkeys producing lip smacks to further investigate this hypothesis. We then explore its putative neural basis and remark on important differences between lip smacking and speech production. Overall, the data support the hypothesis that lip smacking may have been an ancestral expression that was linked to vocal output to produce the original rhythmic audiovisual speech-like utterances in the human lineage.
MCA-NMF: Multimodal Concept Acquisition with Non-Negative Matrix Factorization
Mangin, Olivier; Filliat, David; ten Bosch, Louis; Oudeyer, Pierre-Yves
2015-01-01
In this paper we introduce MCA-NMF, a computational model of the acquisition of multimodal concepts by an agent grounded in its environment. More precisely our model finds patterns in multimodal sensor input that characterize associations across modalities (speech utterances, images and motion). We propose this computational model as an answer to the question of how some class of concepts can be learnt. In addition, the model provides a way of defining such a class of plausibly learnable concepts. We detail why the multimodal nature of perception is essential to reduce the ambiguity of learnt concepts as well as to communicate about them through speech. We then present a set of experiments that demonstrate the learning of such concepts from real non-symbolic data consisting of speech sounds, images, and motions. Finally we consider structure in perceptual signals and demonstrate that a detailed knowledge of this structure, named compositional understanding can emerge from, instead of being a prerequisite of, global understanding. An open-source implementation of the MCA-NMF learner as well as scripts and associated experimental data to reproduce the experiments are publicly available. PMID:26489021
Saldert, Charlotta; Bauer, Malin
2017-01-01
It is known that Parkinson’s disease is often accompanied by a motor speech disorder, which results in impaired communication. However, people with Parkinson’s disease may also have impaired word retrieval (anomia) and other communicative problems, which have a negative impact on their ability to participate in conversations with family as well as healthcare staff. The aim of the present study was to explore effects of impaired speech and language on communication and how this is managed by people with Parkinson’s disease and their spouses. Using a qualitative method based on Conversation Analysis, in-depth analyses were performed on natural conversational interaction in five dyads including elderly men who were at different stages of Parkinson’s disease. The findings showed that the motor speech disorder in combination with word retrieval difficulties and adaptations, such as using communication strategies, may result in atypical utterances that are difficult for communication partners to understand. The coexistence of several communication problems compounds the difficulties faced in conversations and individuals with Parkinson’s disease are often dependent on cooperation with their communication partner to make themselves understood. PMID:28946714
Dykstra, Allyson D; Adams, Scott G; Jog, Mandar
2015-01-01
To examine the relationship between speech intensity and self-ratings of communicative effectiveness in speakers with Parkinson's disease (PD) and hypophonia. An additional purpose was to evaluate if self-ratings of communicative effectiveness made by participants with PD differed from ratings made by primary communication partners. Thirty participants with PD and 15 healthy older adults completed the Communication Effectiveness Survey. Thirty primary communication partners rated the communicative effectiveness of his/her partner with PD. Speech intensity was calculated for participants with PD and control participants based on conversational utterances. Results revealed significant differences between groups in conversational speech intensity (p=.001). Participants with PD self-rated communicative effectiveness significantly lower than control participants (p=.000). Correlational analyses revealed a small but non-significant relationship between speech intensity and communicative effectiveness for participants with PD (r=0.298, p=.110) and control participants (r=0.327, p=.234). Self-ratings of communicative effectiveness made participants with PD was not significantly different than ratings made by primary communication partners (p=.20). Obtaining information on communicative effectiveness may help to broaden outcome measurement and may aid in the provision of educational strategies. Findings also suggest that communicative effectiveness may be a separate and a distinct construct that cannot necessarily be predicted from the severity of hypophonia. Copyright © 2015 Elsevier Inc. All rights reserved.
Processing of prosodic changes in natural speech stimuli in school-age children.
Lindström, R; Lepistö, T; Makkonen, T; Kujala, T
2012-12-01
Speech prosody conveys information about important aspects of communication: the meaning of the sentence and the emotional state or intention of the speaker. The present study addressed processing of emotional prosodic changes in natural speech stimuli in school-age children (mean age 10 years) by recording the electroencephalogram, facial electromyography, and behavioral responses. The stimulus was a semantically neutral Finnish word uttered with four different emotional connotations: neutral, commanding, sad, and scornful. In the behavioral sound-discrimination task the reaction times were fastest for the commanding stimulus and longest for the scornful stimulus, and faster for the neutral than for the sad stimulus. EEG and EMG responses were measured during non-attentive oddball paradigm. Prosodic changes elicited a negative-going, fronto-centrally distributed neural response peaking at about 500 ms from the onset of the stimulus, followed by a fronto-central positive deflection, peaking at about 740 ms. For the commanding stimulus also a rapid negative deflection peaking at about 290 ms from stimulus onset was elicited. No reliable stimulus type specific rapid facial reactions were found. The results show that prosodic changes in natural speech stimuli activate pre-attentive neural change-detection mechanisms in school-age children. However, the results do not support the suggestion of automaticity of emotion specific facial muscle responses to non-attended emotional speech stimuli in children. Copyright © 2012 Elsevier B.V. All rights reserved.
Nouns slow down speech across structurally and culturally diverse languages
Danielsen, Swintha; Hartmann, Iren; Pakendorf, Brigitte; Witzlack-Makarevich, Alena; de Jong, Nivja H.
2018-01-01
By force of nature, every bit of spoken language is produced at a particular speed. However, this speed is not constant—speakers regularly speed up and slow down. Variation in speech rate is influenced by a complex combination of factors, including the frequency and predictability of words, their information status, and their position within an utterance. Here, we use speech rate as an index of word-planning effort and focus on the time window during which speakers prepare the production of words from the two major lexical classes, nouns and verbs. We show that, when naturalistic speech is sampled from languages all over the world, there is a robust cross-linguistic tendency for slower speech before nouns compared with verbs, both in terms of slower articulation and more pauses. We attribute this slowdown effect to the increased amount of planning that nouns require compared with verbs. Unlike verbs, nouns can typically only be used when they represent new or unexpected information; otherwise, they have to be replaced by pronouns or be omitted. These conditions on noun use appear to outweigh potential advantages stemming from differences in internal complexity between nouns and verbs. Our findings suggest that, beneath the staggering diversity of grammatical structures and cultural settings, there are robust universals of language processing that are intimately tied to how speakers manage referential information when they communicate with one another. PMID:29760059
Productivity of lexical categories in French-speaking children with cochlear implants.
Le Normand, M-T; Ouellet, C; Cohen, H
2003-11-01
The productivity of lexical categories was studied longitudinally in a sample of 17 young hearing-impaired French-speaking children with cochlear implants. Age of implantation ranged from 22 months to 76 months. Spontaneous speech samples were collected at six-month intervals over a period of 36 months, starting at the one-word stage. Four general measures of their linguistic production (number of utterances, verbal fluency, vocabulary, and grammatical production) as well as 36 specific lexical categories, according to the CHILDES codes, were computed in terms of tokens, i.e., total number of words. Cochlear-implanted children (CI) were compared to a French database of normally hearing children aged 2-4 compiled by the first author. Follow-up results indicate that, at the two-year post-implantation follow-up, noun, and verb morphology was significantly impaired. At the three-year follow-up, the cochlear-implanted group had recovered on adjectives, determiners and nouns, main verbs, and auxiliaries. The two groups differed significantly in processing locative adverbs, prepositions, pronouns, and verbs (infinitive verb, modal, and modal lexical), but individual variability within the cochlear-implanted group was substantial. Results are discussed in terms of recovery and developmental trends and variability in the acquisition of lexical categories by French children two years and three years post-implantation.
Yorkston, Kathryn M; Baylor, Carolyn; Amtmann, Dagmar
2014-01-01
Individuals with multiple sclerosis (MS) are at risk for communication problems that may restrict their ability to take participation in important life roles such as maintenance of relationships, work, or household management. The aim of this project is to examine selected demographic and symptom-related variables that may contribute to participation restrictions. This examination is intended to aid clinicians in predicting who might be at risk for such restrictions and what variables may be targeted in interventions. Community-dwelling adults with MS (n=216) completed a survey either online or using paper forms. The survey included the 46-item version of the Communicative Participation Item Bank, demographics (age, sex, living situation, employment status, education, and time since onset of diagnosis of MS), and self-reported symptom-related variables (physical activity, emotional problems, fatigue, pain, speech severity, and cognitive/communication skills). In order to identify predictors of restrictions in communicative participation, these variables were entered into a backwards stepwise multiple linear regression analysis. Five variables (cognitive/communication skills, speech severity, speech usage, physical activity, and education) were statistically significant predictors of communication participation. In order to examine the relationship of communicative participation and social role variables, bivariate Spearman correlations were conducted. Results suggest only a fair to moderate relationship between communicative participation and measures of social roles. Communicative participation is a complex construct associated with a number of self-reported variables. Clinicians should be alert to risk factors for reduced communicative participation including reduced cognitive and speech skills, lower levels of speech usage, limitations in physical activities and higher levels of education. The reader will be able to: (a) describe the factors that may restrict participation in individuals with multiple sclerosis; (b) list measures of social functioning that may be pertinent in adults with multiple sclerosis; (c) discuss factors that can be used to predict communicative participation in multiple sclerosis. Copyright © 2014 Elsevier Inc. All rights reserved.
Voice, (inter-)subjectivity, and real time recurrent interaction
Cummins, Fred
2014-01-01
Received approaches to a unified phenomenon called “language” are firmly committed to a Cartesian view of distinct unobservable minds. Questioning this commitment leads us to recognize that the boundaries conventionally separating the linguistic from the non-linguistic can appear arbitrary, omitting much that is regularly present during vocal communication. The thesis is put forward that uttering, or voicing, is a much older phenomenon than the formal structures studied by the linguist, and that the voice has found elaborations and codifications in other domains too, such as in systems of ritual and rite. Voice, it is suggested, necessarily gives rise to a temporally bound subjectivity, whether it is in inner speech (Descartes' “cogito”), in conversation, or in the synchronized utterances of collective speech found in prayer, protest, and sports arenas world wide. The notion of a fleeting subjective pole tied to dynamically entwined participants who exert reciprocal influence upon each other in real time provides an insightful way to understand notions of common ground, or socially shared cognition. It suggests that the remarkable capacity to construct a shared world that is so characteristic of Homo sapiens may be grounded in this ability to become dynamically entangled as seen, e.g., in the centrality of joint attention in human interaction. Empirical evidence of dynamic entanglement in joint speaking is found in behavioral and neuroimaging studies. A convergent theoretical vocabulary is now available in the concept of participatory sense-making, leading to the development of a rich scientific agenda liberated from a stifling metaphysics that obscures, rather than illuminates, the means by which we come to inhabit a shared world. PMID:25101028
Rastle, Kathleen; Croot, Karen P; Harrington, Jonathan M; Coltheart, Max
2005-10-01
The research described in this article had 2 aims: to permit greater precision in the conduct of naming experiments and to contribute to a characterization of the motor execution stage of speech production. The authors report an exhaustive inventory of consonantal and postconsonantal influences on delayed naming latency and onset acoustic duration, derived from a hand-labeled corpus of single-syllable consonant-vowel utterances. Five talkers produced 6 repetitions each of a set of 168 prepared monosyllables, a set that comprised each of the consonantal onsets of English in 3 vowel contexts. Strong and significant effects associated with phonetic characteristics of initial and noninitial phonemes were observed on both delayed naming latency and onset acoustic duration. Results are discussed in terms of the biomechanical properties of the articulatory system that may give rise to these effects and in terms of their methodological implications for naming experiments.
Reading your own lips: common-coding theory and visual speech perception.
Tye-Murray, Nancy; Spehar, Brent P; Myerson, Joel; Hale, Sandra; Sommers, Mitchell S
2013-02-01
Common-coding theory posits that (1) perceiving an action activates the same representations of motor plans that are activated by actually performing that action, and (2) because of individual differences in the ways that actions are performed, observing recordings of one's own previous behavior activates motor plans to an even greater degree than does observing someone else's behavior. We hypothesized that if observing oneself activates motor plans to a greater degree than does observing others, and if these activated plans contribute to perception, then people should be able to lipread silent video clips of their own previous utterances more accurately than they can lipread video clips of other talkers. As predicted, two groups of participants were able to lipread video clips of themselves, recorded more than two weeks earlier, significantly more accurately than video clips of others. These results suggest that visual input activates speech motor activity that links to word representations in the mental lexicon.
Polur, Prasad D; Miller, Gerald E
2006-10-01
Computer speech recognition of individuals with dysarthria, such as cerebral palsy patients requires a robust technique that can handle conditions of very high variability and limited training data. In this study, application of a 10 state ergodic hidden Markov model (HMM)/artificial neural network (ANN) hybrid structure for a dysarthric speech (isolated word) recognition system, intended to act as an assistive tool, was investigated. A small size vocabulary spoken by three cerebral palsy subjects was chosen. The effect of such a structure on the recognition rate of the system was investigated by comparing it with an ergodic hidden Markov model as a control tool. This was done in order to determine if this modified technique contributed to enhanced recognition of dysarthric speech. The speech was sampled at 11 kHz. Mel frequency cepstral coefficients were extracted from them using 15 ms frames and served as training input to the hybrid model setup. The subsequent results demonstrated that the hybrid model structure was quite robust in its ability to handle the large variability and non-conformity of dysarthric speech. The level of variability in input dysarthric speech patterns sometimes limits the reliability of the system. However, its application as a rehabilitation/control tool to assist dysarthric motor impaired individuals holds sufficient promise.
Walton, Katherine M; Ingersoll, Brooke R
2015-05-01
Adult responsiveness is related to language development both in young typically developing children and in children with autism spectrum disorders, such that parents who use more responsive language with their children have children who develop better language skills over time. This study used a micro-analytic technique to examine how two facets of maternal utterances, relationship to child focus of attention and degree of demandingness, influenced the immediate use of appropriate expressive language of preschool-aged children with autism spectrum disorders (n = 28) and toddlers with typical development (n = 16) within a naturalistic mother-child play session. Mothers' use of follow-in demanding language was most likely to elicit appropriate expressive speech in both children with autism spectrum disorders and children with typical development. For children with autism spectrum disorders, but not children with typical development, mothers' use of orienting cues conferred an additional benefit for expressive speech production. These findings are consistent with the naturalistic behavioral intervention philosophy and suggest that following a child's lead while prompting for language is likely to elicit speech production in children with autism spectrum disorders and children with typical development. Furthermore, using orienting cues may help children with autism spectrum disorders to verbally respond. © The Author(s) 2014.
Toward a more ecologically valid measure of speech understanding in background noise.
Jerger, J; Greenwald, R; Wambacq, I; Seipel, A; Moncrieff, D
2000-05-01
In an attempt to develop a more ecologically valid measure of speech understanding in a background of competing speech, we constructed a quasidichotic procedure based on the monitoring of continuous speech from loudspeakers placed directly to the listener's right and left sides. The listener responded to the presence of incongruous or anomalous words imbedded within the context of two children's fairy tales. Attention was directed either to the right or to the left side in blocks of 25 utterances. Within each block, there were target (anomalous) and nontarget (nonanomalous) words. Responses to target words were analyzed separately for attend-right and attend-left conditions. Our purpose was twofold: (1) to evaluate the feasibility of such an approach for obtaining electrophysiologic performance measures in the sound field and (2) to gather normative interaural symmetry data for the new technique in young adults with normal hearing. Event-related potentials to target and nontarget words at 30 electrode sites were obtained in 20 right-handed young adults with normal hearing. Waveforms and associated topographic maps were characterized by a slight negativity in the region of 400 msec (N400) and robust positivity in the region of 900 msec (P900). Norms for interaural symmetry of the P900 event-related potential in young adults were derived.
Advancements in robust algorithm formulation for speaker identification of whispered speech
NASA Astrophysics Data System (ADS)
Fan, Xing
Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard/made public. Due to the profound differences between whispered and neutral speech in production mechanism and the absence of whispered adaptation data, the performance of speaker identification systems trained with neutral speech degrades significantly. This dissertation therefore focuses on developing a robust closed-set speaker recognition system for whispered speech by using no or limited whispered adaptation data from non-target speakers. This dissertation proposes the concept of "High''/"Low'' performance whispered data for the purpose of speaker identification. A variety of acoustic properties are identified that contribute to the quality of whispered data. An acoustic analysis is also conducted to compare the phoneme/speaker dependency of the differences between whispered and neutral data in the feature domain. The observations from those acoustic analysis are new in this area and also serve as a guidance for developing robust speaker identification systems for whispered speech. This dissertation further proposes two systems for speaker identification of whispered speech. One system focuses on front-end processing. A two-dimensional feature space is proposed to search for "Low''-quality performance based whispered utterances and separate feature mapping functions are applied to vowels and consonants respectively in order to retain the speaker's information shared between whispered and neutral speech. The other system focuses on speech-mode-independent model training. The proposed method generates pseudo whispered features from neutral features by using the statistical information contained in a whispered Universal Background model (UBM) trained from extra collected whispered data from non-target speakers. Four modeling methods are proposed for the transformation estimation in order to generate the pseudo whispered features. Both of the above two systems demonstrate a significant improvement over the baseline system on the evaluation data. This dissertation has therefore contributed to providing a scientific understanding of the differences between whispered and neutral speech as well as improved front-end processing and modeling method for speaker identification of whispered speech. Such advancements will ultimately contribute to improve the robustness of speech processing systems.
Enlargement of the supraglottal cavity and its relation to stop consonant voicing.
Westbury, J R
1983-04-01
Measurements were made of saggital plane movements of the larynx, soft palate, and portions of the tongue, from a high-speed cinefluorographic film of utterances produced by one adult male speaker of American English. These measures were then used to approximate the temporal variations in supraglottal cavity volume during the closures of voiced and voiceless stop consonants. All data were subsequently related to a synchronous acoustic recording of the utterances. Instances of /p,t,k/ were always accompanied by silent closures, and sometimes accompanied by decreases in supraglottal volume. In contrast, instances of /b,d,g/ were always accompanied both by significant intervals of vocal fold vibration during closure, and relatively large increases in supraglottal volume. However, the magnitudes of volume increments during the voiced stops, and the means by which those increments were achieved, differed considerably across place of articulation and phonetic environment. These results are discussed in the context of a well-known model of the breath-stream control mechanism, and their relevance for a general theory of speech motor control is considered.
Towards Artificial Speech Therapy: A Neural System for Impaired Speech Segmentation.
Iliya, Sunday; Neri, Ferrante
2016-09-01
This paper presents a neural system-based technique for segmenting short impaired speech utterances into silent, unvoiced, and voiced sections. Moreover, the proposed technique identifies those points of the (voiced) speech where the spectrum becomes steady. The resulting technique thus aims at detecting that limited section of the speech which contains the information about the potential impairment of the speech. This section is of interest to the speech therapist as it corresponds to the possibly incorrect movements of speech organs (lower lip and tongue with respect to the vocal tract). Two segmentation models to detect and identify the various sections of the disordered (impaired) speech signals have been developed and compared. The first makes use of a combination of four artificial neural networks. The second is based on a support vector machine (SVM). The SVM has been trained by means of an ad hoc nested algorithm whose outer layer is a metaheuristic while the inner layer is a convex optimization algorithm. Several metaheuristics have been tested and compared leading to the conclusion that some variants of the compact differential evolution (CDE) algorithm appears to be well-suited to address this problem. Numerical results show that the SVM model with a radial basis function is capable of effective detection of the portion of speech that is of interest to a therapist. The best performance has been achieved when the system is trained by the nested algorithm whose outer layer is hybrid-population-based/CDE. A population-based approach displays the best performance for the isolation of silence/noise sections, and the detection of unvoiced sections. On the other hand, a compact approach appears to be clearly well-suited to detect the beginning of the steady state of the voiced signal. Both the proposed segmentation models display outperformed two modern segmentation techniques based on Gaussian mixture model and deep learning.
Direct recordings from the auditory cortex in a cochlear implant user.
Nourski, Kirill V; Etler, Christine P; Brugge, John F; Oya, Hiroyuki; Kawasaki, Hiroto; Reale, Richard A; Abbas, Paul J; Brown, Carolyn J; Howard, Matthew A
2013-06-01
Electrical stimulation of the auditory nerve with a cochlear implant (CI) is the method of choice for treatment of severe-to-profound hearing loss. Understanding how the human auditory cortex responds to CI stimulation is important for advances in stimulation paradigms and rehabilitation strategies. In this study, auditory cortical responses to CI stimulation were recorded intracranially in a neurosurgical patient to examine directly the functional organization of the auditory cortex and compare the findings with those obtained in normal-hearing subjects. The subject was a bilateral CI user with a 20-year history of deafness and refractory epilepsy. As part of the epilepsy treatment, a subdural grid electrode was implanted over the left temporal lobe. Pure tones, click trains, sinusoidal amplitude-modulated noise, and speech were presented via the auxiliary input of the right CI speech processor. Additional experiments were conducted with bilateral CI stimulation. Auditory event-related changes in cortical activity, characterized by the averaged evoked potential and event-related band power, were localized to posterolateral superior temporal gyrus. Responses were stable across recording sessions and were abolished under general anesthesia. Response latency decreased and magnitude increased with increasing stimulus level. More apical intracochlear stimulation yielded the largest responses. Cortical evoked potentials were phase-locked to the temporal modulations of periodic stimuli and speech utterances. Bilateral electrical stimulation resulted in minimal artifact contamination. This study demonstrates the feasibility of intracranial electrophysiological recordings of responses to CI stimulation in a human subject, shows that cortical response properties may be similar to those obtained in normal-hearing individuals, and provides a basis for future comparisons with extracranial recordings.
Variability in Phonetics. York Papers in Linguistics, No. 6.
ERIC Educational Resources Information Center
Tatham, M. A. A.
Variability is a term used to cover several types of phenomena in language sound patterns and in phonetic realization of those patterns. Variability refers to the fact that every repetition of an utterance is different, in amplitude, rate of delivery, formant frequencies, fundamental frequency or minor phase relationship changes across the sound…
NASA Astrophysics Data System (ADS)
Noble, Tracy
This study is an exploration of the role of physical activity in making sense of the physical world. Recent work on embodied cognition has helped to break down the barrier between the body and cognition, providing the inspiration for this work. In this study, I asked ten elementary-school students to explain to me how a toy parachute works. The methods used were adapted from those used to study the role of the body in cognition in science education, child development, and psychology. This study focused on the processes of learning rather than on measuring learning outcomes. Multiple levels of analysis were pursued in a mixed-method research design. The first level was individual analyses of two students' utterances and body motions. These analyses provided initial hypotheses about the interaction of speech and body motion in students' developing understandings. The second level was group analyses of all ten students' data, in search of patterns and relationships between body motion and speech production across all the student-participants. Finally, a third level of analysis was used to explore all cases in which students produced analogies while they discussed how the parachute works. The multiple levels of analysis used in this study allowed for raising and answering some questions, and allowed for the characterization of both individual differences and group commonalities. The findings of this study show that there are several significant patterns of interaction between body motion and speech that demonstrate a role for the body in cognition. The use of sensory feedback from physical interactions with objects to create new explanations, and the use of interactions with objects to create blended spaces to support the construction of analogies are two of these patterns. Future work is needed to determine the generalizability of these patterns to other individuals and other learning contexts. However, the existence of these patterns lends concrete support to the ideas of embodied cognition and demonstrates how students can use their own embodied experience to understand the world.
1983-08-16
34. " .. ,,,,.-j.Aid-is.. ;,,i . -i.t . "’" ’, V ,1 5- 4. 3- kHz 2-’ r 1 r s ’.:’ BOGEY 5D 0 S BOGEY 12D Figure 10. Spectrograms of two versions of the word...MF5852801B 0001 Reviewed by Approved and Released by Ashton Graybiel, M.D. Captain W. M. Houk , MC, USN Chief Scientific Advisor Commanding Officer 16 August...incorporating knowledge about these changes into speech recognition systems. i A J- I. . S , .4, ... ..’-° -- -iii l - - .- - i- . .. " •- - i ,f , i
NASA Astrophysics Data System (ADS)
Kuroki, Hayato; Ino, Shuichi; Nakano, Satoko; Hori, Kotaro; Ifukube, Tohru
The authors of this paper have been studying a real-time speech-to-caption system using speech recognition technology with a “repeat-speaking” method. In this system, they used a “repeat-speaker” who listens to a lecturer's voice and then speaks back the lecturer's speech utterances into a speech recognition computer. The througoing system showed that the accuracy of the captions is about 97% in Japanese-Japanese conversion and the conversion time from voices to captions is about 4 seconds in English-English conversion in some international conferences. Of course it required a lot of costs to achieve these high performances. In human communications, speech understanding depends not only on verbal information but also on non-verbal information such as speaker's gestures, and face and mouth movements. So the authors found the idea to display information of captions and speaker's face movement images with a suitable way to achieve a higher comprehension after storing information once into a computer briefly. In this paper, we investigate the relationship of the display sequence and display timing between captions that have speech recognition errors and the speaker's face movement images. The results show that the sequence “to display the caption before the speaker's face image” improves the comprehension of the captions. The sequence “to display both simultaneously” shows an improvement only a few percent higher than the question sentence, and the sequence “to display the speaker's face image before the caption” shows almost no change. In addition, the sequence “to display the caption 1 second before the speaker's face shows the most significant improvement of all the conditions.
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion
Schutz, Michael
2017-01-01
Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy”) pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech. PMID:29249997
Koike, Narihiko; Ii, Satoshi; Yoshinaga, Tsukasa; Nozaki, Kazunori; Wada, Shigeo
2017-11-07
This paper presents a novel inverse estimation approach for the active contraction stresses of tongue muscles during speech. The proposed method is based on variational data assimilation using a mechanical tongue model and 3D tongue surface shapes for speech production. The mechanical tongue model considers nonlinear hyperelasticity, finite deformation, actual geometry from computed tomography (CT) images, and anisotropic active contraction by muscle fibers, the orientations of which are ideally determined using anatomical drawings. The tongue deformation is obtained by solving a stationary force-equilibrium equation using a finite element method. An inverse problem is established to find the combination of muscle contraction stresses that minimizes the Euclidean distance of the tongue surfaces between the mechanical analysis and CT results of speech production, where a signed-distance function represents the tongue surface. Our approach is validated through an ideal numerical example and extended to the real-world case of two Japanese vowels, /ʉ/ and /ɯ/. The results capture the target shape completely and provide an excellent estimation of the active contraction stresses in the ideal case, and exhibit similar tendencies as in previous observations and simulations for the actual vowel cases. The present approach can reveal the relative relationship among the muscle contraction stresses in similar utterances with different tongue shapes, and enables the investigation of the coordination of tongue muscles during speech using only the deformed tongue shape obtained from medical images. This will enhance our understanding of speech motor control. Copyright © 2017 Elsevier Ltd. All rights reserved.
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion.
Schutz, Michael
2017-01-01
Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally "happy") pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers "trade off" cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music-widely recognized for its artistic significance-complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.
The effect of emotion on articulation rate in persistence and recovery of childhood stuttering.
Erdemir, Aysu; Walden, Tedra A; Jefferson, Caswell M; Choi, Dahye; Jones, Robin M
2018-06-01
This study investigated the possible association of emotional processes and articulation rate in pre-school age children who stutter and persist (persisting), children who stutter and recover (recovered) and children who do not stutter (nonstuttering). The participants were ten persisting, ten recovered, and ten nonstuttering children between the ages of 3-5 years; who were classified as persisting, recovered, or nonstuttering approximately 2-2.5 years after the experimental testing took place. The children were exposed to three emotionally-arousing video clips (baseline, positive and negative) and produced a narrative based on a text-free storybook following each video clip. From the audio-recordings of these narratives, individual utterances were transcribed and articulation rates were calculated. Results indicated that persisting children exhibited significantly slower articulation rates following the negative emotion condition, unlike recovered and nonstuttering children whose articulation rates were not affected by either of the two emotion-inducing conditions. Moreover, all stuttering children displayed faster rates during fluent compared to stuttered speech; however, the recovered children were significantly faster than the persisting children during fluent speech. Negative emotion plays a detrimental role on the speech-motor control processes of children who persist, whereas children who eventually recover seem to exhibit a relatively more stable and mature speech-motor system. This suggests that complex interactions between speech-motor and emotional processes are at play in stuttering recovery and persistency; and articulation rates following negative emotion or during stuttered versus fluent speech might be considered as potential factors to prospectively predict persistence and recovery from stuttering. Copyright © 2017 Elsevier Inc. All rights reserved.
Relation between measures of speech-in-noise performance and measures of efferent activity
NASA Astrophysics Data System (ADS)
Smith, Brad; Harkrider, Ashley; Burchfield, Samuel; Nabelek, Anna
2003-04-01
Individual differences in auditory perceptual abilities in noise are well documented but the factors causing such variability are unclear. The purpose of this study was to determine if individual differences in responses measured from the auditory efferent system were correlated to individual variations in speech-in-noise performance. The relation between behavioral performance on three speech-in-noise tasks and two objective measures of the efferent auditory system were examined in thirty normal-hearing, young adults. Two of the speech-in-noise tasks measured an acceptable noise level, the maximum level of speech-babble noise that a subject is willing to accept while listening to a story. For these, the acceptable noise level was evaluated using both an ipsilateral (story and noise in same ear) and a contralateral (story and noise in opposite ears) paradigm. The third speech-in-noise task evaluated speech recognition using monosyllabic words presented in competing speech babble. Auditory efferent activity was assessed by examining the resulting suppression of click-evoked otoacoustic emissions following the introduction of a contralateral, broad-band stimulus and the activity of the ipsilateral and contralateral acoustic reflex arc was evaluated using tones and broad-band noise. Results will be discussed relative to current theories of speech in noise performance and auditory inhibitory processes.
Effect of interaction type on the characteristics of pet-directed speech in female dog owners.
Jeannin, Sarah; Gilbert, Caroline; Leboucher, Gérard
2017-05-01
Recent studies focusing on the interspecific communicative interactions between humans and dogs show that owners use a special speech register when addressing their dog. This register, called pet-directed speech (PDS), has prosodic and syntactic features similar to that of infant-directed speech (IDS). While IDS prosody is known to vary according to the context of the communication with babies, we still know little about the way owners adjust acoustic and verbal PDS features according to the type of interaction with their dog. The aim of the study was therefore to explore whether the characteristics of women's speech depend on the nature of interaction with their dog. We recorded 34 adult women interacting with their dog in four conditions: before a brief separation, after reuniting, during play and while giving commands. Our results show that before separation women used a low pitch, few modulations, high intensity variations and very few affective sentences. In contrast, the reunion interactions were characterized by a very high pitch, few imperatives and a high frequency of affectionate nicknames. During play, women used mainly questions and attention-getting devices. Finally when commanding, women mainly used imperatives as well as attention-getting devices. Thus, like mothers using IDS, female owners adapt the verbal as well as the non-verbal characteristics of their PDS to the nature of the interaction with their dog, suggesting that the intended function of these vocal utterances remains to provide dogs with information about their intentions and emotions.
Tjaden, Kris; Wilding, Greg
2011-01-01
This study examined the extent to which articulatory rate reduction and increased loudness were associated with adjustments in utterance-level measures of fundamental frequency (F(0)) variability for speakers with dysarthria and healthy controls that have been shown to impact on intelligibility in previously published studies. More generally, the current study sought to compare and contrast how a slower-than-normal rate and increased vocal loudness impact on a variety of utterance-level F(0) characteristics for speakers with dysarthria and healthy controls. Eleven speakers with Parkinson's disease, 15 speakers with multiple sclerosis, and 14 healthy control speakers were audio recorded while reading a passage in habitual, loud, and slow conditions. Magnitude production was used to elicit variations in rate and loudness. Acoustic measures of duration, intensity and F(0) were obtained. For all speaker groups, a slower-than-normal articulatory rate and increased vocal loudness had distinct effects on F(0) relative to the habitual condition, including a tendency for measures of F(0) variation to be greater in the loud condition and reduced in the slow condition. These results suggest implications for the treatment of dysarthria. Copyright © 2010 S. Karger AG, Basel.
Tjaden, Kris; Wilding, Greg
2011-01-01
Objective This study examined the extent to which articulatory rate reduction and increased loudness were associated with adjustments in utterance-level measures of fundamental frequency (F0) variability for speakers with dysarthria and healthy controls that have been shown to impact on intelligibility in previously published studies. More generally, the current study sought to compare and contrast how a slower-than-normal rate and increased vocal loudness impact on a variety of utterance-level F0 characteristics for speakers with dysarthria and healthy controls. Patients and Methods Eleven speakers with Parkinson's disease, 15 speakers with multiple sclerosis, and 14 healthy control speakers were audio recorded while reading a passage in habitual, loud, and slow conditions. Magnitude production was used to elicit variations in rate and loudness. Acoustic measures of duration, intensity and F0 were obtained. Results and Conclusions For all speaker groups, a slower-than-normal articulatory rate and increased vocal loudness had distinct effects on F0 relative to the habitual condition, including a tendency for measures of F0 variation to be greater in the loud condition and reduced in the slow condition. These results suggest implications for the treatment of dysarthria. PMID:20938199
How much is a word? Predicting ease of articulation planning from apraxic speech error patterns.
Ziegler, Wolfram; Aichert, Ingrid
2015-08-01
According to intuitive concepts, 'ease of articulation' is influenced by factors like word length or the presence of consonant clusters in an utterance. Imaging studies of speech motor control use these factors to systematically tax the speech motor system. Evidence from apraxia of speech, a disorder supposed to result from speech motor planning impairment after lesions to speech motor centers in the left hemisphere, supports the relevance of these and other factors in disordered speech planning and the genesis of apraxic speech errors. Yet, there is no unified account of the structural properties rendering a word easy or difficult to pronounce. To model the motor planning demands of word articulation by a nonlinear regression model trained to predict the likelihood of accurate word production in apraxia of speech. We used a tree-structure model in which vocal tract gestures are embedded in hierarchically nested prosodic domains to derive a recursive set of terms for the computation of the likelihood of accurate word production. The model was trained with accuracy data from a set of 136 words averaged over 66 samples from apraxic speakers. In a second step, the model coefficients were used to predict a test dataset of accuracy values for 96 new words, averaged over 120 samples produced by a different group of apraxic speakers. Accurate modeling of the first dataset was achieved in the training study (R(2)adj = .71). In the cross-validation, the test dataset was predicted with a high accuracy as well (R(2)adj = .67). The model shape, as reflected by the coefficient estimates, was consistent with current phonetic theories and with clinical evidence. In accordance with phonetic and psycholinguistic work, a strong influence of word stress on articulation errors was found. The proposed model provides a unified and transparent account of the motor planning requirements of word articulation. Copyright © 2015 Elsevier Ltd. All rights reserved.
An online support site for preparation of oral presentations in science and engineering
NASA Astrophysics Data System (ADS)
Kunioshi, Nílson; Noguchi, Judy; Hayashi, Hiroko; Tojo, Kazuko
2012-12-01
Oral communication skills are essential for engineers today and, as they are included in accreditation criteria of educational programmes, their teaching and evaluation deserve attention. However, concrete aspects as to what should be taught and evaluated in relation to oral communication skills have not been sufficiently established. In this paper, a method to aid the efficient teaching of oral presentation skills is proposed, from the presentation structure level to word and sentence level choices, through the use of JECPRESE, The Japanese-English Corpus of Presentations in Science and Engineering. As of June 2012, the corpus is composed of transcriptions of 74 presentations delivered in Japanese by students graduating from the Master's programme of various engineering departments and 31 presentations delivered in English, 16 by experienced researchers at an international conference on chemistry, and 15 by undergraduate engineering students of a mid-sized American university. The utterances were classified according to the specific moves (sections of the speech that express specific speaker intent) appearing in the presentations and frequently used words/expressions to express these moves were identified.
ERIC Educational Resources Information Center
Hilger, Allison I.; Loucks, Torrey M. J.; Quinto-Pozos, David; Dye, Matthew W. G.
2015-01-01
A study was conducted to examine production variability in American Sign Language (ASL) in order to gain insight into the development of motor control in a language produced in another modality. Production variability was characterized through the spatiotemporal index (STI), which represents production stability in whole utterances and is a…
Speech entrainment enables patients with Broca’s aphasia to produce fluent speech
Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris
2012-01-01
A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and functional magnetic resonance imaging data were collected before and after the treatment phase. Patients were able to produce a greater variety of words with and without speech entrainment at 1 and 6 weeks after training. Treatment-related decrease in cortical activation associated with speech entrainment was found in areas of the left posterior-inferior parietal lobe. We conclude that speech entrainment allows patients with Broca’s aphasia to double their speech output compared with spontaneous speech. Neuroimaging results suggest that speech entrainment allows patients to produce fluent speech by providing an external gating mechanism that yokes a ventral language network that encodes conceptual aspects of speech. Preliminary results suggest that training with speech entrainment improves speech production in Broca’s aphasia providing a potential therapeutic method for a disorder that has been shown to be particularly resistant to treatment. PMID:23250889
Kakouros, Sofoklis; Räsänen, Okko
2016-09-01
Numerous studies have examined the acoustic correlates of sentential stress and its underlying linguistic functionality. However, the mechanism that connects stress cues to the listener's attentional processing has remained unclear. Also, the learnability versus innateness of stress perception has not been widely discussed. In this work, we introduce a novel perspective to the study of sentential stress and put forward the hypothesis that perceived sentence stress in speech is related to the unpredictability of prosodic features, thereby capturing the attention of the listener. As predictability is based on the statistical structure of the speech input, the hypothesis also suggests that stress perception is a result of general statistical learning mechanisms. To study this idea, computational simulations are performed where temporal prosodic trajectories are modeled with an n-gram model. Probabilities of the feature trajectories are subsequently evaluated on a set of novel utterances and compared to human perception of stress. The results show that the low-probability regions of F0 and energy trajectories are strongly correlated with stress perception, giving support to the idea that attention and unpredictability of sensory stimulus are mutually connected. Copyright © 2015 Cognitive Science Society, Inc.
Age-related changes in the anticipatory coarticulation in the speech of young children
NASA Astrophysics Data System (ADS)
Parson, Mathew; Lloyd, Amanda; Stoddard, Kelly; Nissen, Shawn L.
2003-10-01
This paper investigates the possible patterns of anticipatory coarticulation in the speech of young children. Speech samples were elicited from three groups of children between 3 and 6 years of age and one comparison group of adults. The utterances were recorded online in a quiet room environment using high quality microphones and direct analog-to-digital conversion to computer disk. Formant frequency measures (F1, F2, and F3) were extracted from a centralized and unstressed vowel (schwa) spoken prior to two different sets of productions. The first set of productions consisted of the target vowel followed by a series of real words containing an initial CV(C) syllable (voiceless obstruent-monophthongal vowel) in a range of phonetic contexts, while the second set consisted of a series of nonword productions with a relatively constrained phonetic context. An analysis of variance was utilized to determine if the formant frequencies varied systematically as a function of age, gender, and phonetic context. Results will also be discussed in association with spectral moment measures extracted from the obstruent segment immediately following the target vowel. [Work supported by research funding from Brigham Young University.
Laughter as an approach to vocal evolution: The bipedal theory.
Provine, Robert R
2017-02-01
Laughter is a simple, stereotyped, innate, human play vocalization that is ideal for the study of vocal evolution. The basic approach of describing the act of laughter and when we do it has revealed a variety of phenomena of social, linguistic, and neurological significance. Findings include the acoustic structure of laughter, the minimal voluntary control of laughter, the punctuation effect (which describes the placement of laughter in conversation and indicates the dominance of speech over laughter), and the role of laughter in human matching and mating. Especially notable is the use of laughter to discover why humans can speak and other apes cannot. Quadrupeds, including our primate ancestors, have a 1:1 relation between breathing and stride because their thorax must absorb forelimb impacts during running. The direct link between breathing and locomotion limits vocalizations to short, simple utterances, such as the characteristic panting chimpanzee laugh (one sound per inward or outward breath). The evolution of bipedal locomotion freed the respiration system of its support function during running, permitting greater breath control and the selection for human-type laughter (a parsed exhalation), and subsequently the virtuosic, sustained, expiratory vocalization of speech. This is the basis of the bipedal theory of speech evolution.