Science 101: How Does Speech-Recognition Software Work?
ERIC Educational Resources Information Center
Robertson, Bill
2016-01-01
This column provides background science information for elementary teachers. Many innovations with computer software begin with analysis of how humans do a task. This article takes a look at how humans recognize spoken words and explains the origins of speech-recognition software.
ERIC Educational Resources Information Center
Wood, Sarah G.; Moxley, Jerad H.; Tighe, Elizabeth L.; Wagner, Richard K.
2018-01-01
Text-to-speech and related read-aloud tools are being widely implemented in an attempt to assist students' reading comprehension skills. Read-aloud software, including text-to-speech, is used to translate written text into spoken text, enabling one to listen to written text while reading along. It is not clear how effective text-to-speech is at…
Long term rehabilitation of a total glossectomy patient.
Bachher, Gurmit Kaur; Dholam, Kanchan P
2010-09-01
Malignant tumours of the oral cavity that require resection of the tongue result in severe deficiencies in speech and deglutition. Speech misarticulation leads to loss of speech intelligibility, which can prevent or limit communication. Prosthodontic rehabilitation involves fabrication of a Palatal Augmentation Prosthesis (PAP) following partial glossectomy and a mandibular tongue prosthesis after total glossectomy [1]. Speech analysis of a total glossectmy patient rehabilitated with a tongue prosthesis was done with the help of Dr. Speech Software Version 4 (Tiger DRS, Inc., Seattle) twelve years after treatment. Speech therapy sessions along with a prosthesis helped him to correct the dental sounds by using the lower lip and upper dentures (labio-dentals). It was noticed that speech intelligibility, intonation pattern, speech articulation and overall loudness was noticeably improved.
Speech recognition technology: an outlook for human-to-machine interaction.
Erdel, T; Crooks, S
2000-01-01
Speech recognition, as an enabling technology in healthcare-systems computing, is a topic that has been discussed for quite some time, but is just now coming to fruition. Traditionally, speech-recognition software has been constrained by hardware, but improved processors and increased memory capacities are starting to remove some of these limitations. With these barriers removed, companies that create software for the healthcare setting have the opportunity to write more successful applications. Among the criticisms of speech-recognition applications are the high rates of error and steep training curves. However, even in the face of such negative perceptions, there remains significant opportunities for speech recognition to allow healthcare providers and, more specifically, physicians, to work more efficiently and ultimately spend more time with their patients and less time completing necessary documentation. This article will identify opportunities for inclusion of speech-recognition technology in the healthcare setting and examine major categories of speech-recognition software--continuous speech recognition, command and control, and text-to-speech. We will discuss the advantages and disadvantages of each area, the limitations of the software today, and how future trends might affect them.
ERIC Educational Resources Information Center
Olsen, Daniel J.
2014-01-01
While speech analysis technology has become an integral part of phonetic research, and to some degree is used in language instruction at the most advanced levels, it appears to be mostly absent from the beginning levels of language instruction. In part, the lack of incorporation into the language classroom can be attributed to both the lack of…
Tóth, László; Hoffmann, Ildikó; Gosztolya, Gábor; Vincze, Veronika; Szatlóczki, Gréta; Bánréti, Zoltán; Pákáski, Magdolna; Kálmán, János
2018-01-01
Background: Even today the reliable diagnosis of the prodromal stages of Alzheimer’s disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive de-cline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Methods: Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech sig-nals, first manually (using the Praat software), and then automatically, with an automatic speech recogni-tion (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. Results: The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process – that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. Conclusion: The temporal analysis of spontaneous speech can be exploited in implementing a new, auto-matic detection-based tool for screening MCI for the community. PMID:29165085
Toth, Laszlo; Hoffmann, Ildiko; Gosztolya, Gabor; Vincze, Veronika; Szatloczki, Greta; Banreti, Zoltan; Pakaski, Magdolna; Kalman, Janos
2018-01-01
Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech signals, first manually (using the Praat software), and then automatically, with an automatic speech recognition (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process - that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. The temporal analysis of spontaneous speech can be exploited in implementing a new, automatic detection-based tool for screening MCI for the community. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Choosing and Using Text-to-Speech Software
ERIC Educational Resources Information Center
Peters, Tom; Bell, Lori
2007-01-01
This article describes a computer-based technology for generating speech called text-to-speech (TTS). This software is ready for widespread use by libraries, other organizations, and individual users. It offers the affordable ability to turn just about any electronic text that is not image-based into an artificially spoken communication. The…
Chang, Yen-Liang; Hung, Chao-Ho; Chen, Po-Yueh; Chen, Wei-Chang; Hung, Shih-Han
2015-10-01
Acoustic analysis is often used in speech evaluation but seldom for the evaluation of oral prostheses designed for reconstruction of surgical defect. This study aimed to introduce the application of acoustic analysis for patients with velopharyngeal insufficiency (VPI) due to oral surgery and rehabilitated with oral speech-aid prostheses. The pre- and postprosthetic rehabilitation acoustic features of sustained vowel sounds from two patients with VPI were analyzed and compared with the acoustic analysis software Praat. There were significant differences in the octave spectrum of sustained vowel speech sound between the pre- and postprosthetic rehabilitation. Acoustic measurements of sustained vowels for patients before and after prosthetic treatment showed no significant differences for all parameters of fundamental frequency, jitter, shimmer, noise-to-harmonics ratio, formant frequency, F1 bandwidth, and band energy difference. The decrease in objective nasality perceptions correlated very well with the decrease in dips of the spectra for the male patient with a higher speech bulb height. Acoustic analysis may be a potential technique for evaluating the functions of oral speech-aid prostheses, which eliminates dysfunctions due to the surgical defect and contributes to a high percentage of intelligible speech. Octave spectrum analysis may also be a valuable tool for detecting changes in nasality characteristics of the voice during prosthetic treatment of VPI. Copyright © 2014. Published by Elsevier B.V.
Higher order statistical analysis of /x/ in male speech.
Orr, M C; Lithgow, B
2005-03-01
This paper presents a study of kurtosis analysis for the sound /x/ in male speech, /x/ is the sound of the 'o' at the end of words such as 'ago'. The sound analysed for this paper came from the Australian National Database of Spoken Language, more specifically the male speaker 17. The /x/ was isolated and extracted from the database by the author in a quiet booth using standard multimedia software. A 5 millisecond window was used for the analysis as it was shown previously by the author to be the most appropriate size for speech phoneme analysis. The significance of the research presented here is shown in the results where a majority of coefficients had a platykurtic (kurtosis between 0 and 3) value as opposed to the previously held leptokurtic (kurtosis > 3) belief.
ERIC Educational Resources Information Center
Chen, Howard Hao-Jan
2011-01-01
Oral communication ability has become increasingly important to many EFL students. Several commercial software programs based on automatic speech recognition (ASR) technologies are available but their prices are not affordable for many students. This paper will demonstrate how the Microsoft Speech Application Software Development Kit (SASDK), a…
ERIC Educational Resources Information Center
Cordier, Deborah
2009-01-01
A renewed focus on foreign language (FL) learning and speech for communication has resulted in computer-assisted language learning (CALL) software developed with Automatic Speech Recognition (ASR). ASR features for FL pronunciation (Lafford, 2004) are functional components of CALL designs used for FL teaching and learning. The ASR features…
Analyzing clinical phonological data using Phon
McAllister Byun, Tara
2016-01-01
In this paper, we describe how Phon, a software program for the transcription and analysis of phonological data, can be applied to facilitate clinical phonological analyses. We begin with a summary of the types of analyses that are frequently used in the assessment and management of speech sound disorders. We then discuss challenges inherent to the transcription and analysis of clinical phonological data. For each challenge, we discuss solutions currently available within Phon, and offer an outlook on future methodological and technical developments in the area of clinical phonology. This paper includes a step-by-step introduction to Phon suitable for readers who lack previous experience with the software. We conclude with a discussion of data sharing and its vital role in advancing research and intervention practices in the area of speech development and disorders. PMID:27111269
NASA Astrophysics Data System (ADS)
Kattoju, Ravi Kiran; Barber, Daniel J.; Abich, Julian; Harris, Jonathan
2016-05-01
With increasing necessity for intuitive Soldier-robot communication in military operations and advancements in interactive technologies, autonomous robots have transitioned from assistance tools to functional and operational teammates able to service an array of military operations. Despite improvements in gesture and speech recognition technologies, their effectiveness in supporting Soldier-robot communication is still uncertain. The purpose of the present study was to evaluate the performance of gesture and speech interface technologies to facilitate Soldier-robot communication during a spatial-navigation task with an autonomous robot. Gesture and speech semantically based spatial-navigation commands leveraged existing lexicons for visual and verbal communication from the U.S Army field manual for visual signaling and a previously established Squad Level Vocabulary (SLV). Speech commands were recorded by a Lapel microphone and Microsoft Kinect, and classified by commercial off-the-shelf automatic speech recognition (ASR) software. Visual signals were captured and classified using a custom wireless gesture glove and software. Participants in the experiment commanded a robot to complete a simulated ISR mission in a scaled down urban scenario by delivering a sequence of gesture and speech commands, both individually and simultaneously, to the robot. Performance and reliability of gesture and speech hardware interfaces and recognition tools were analyzed and reported. Analysis of experimental results demonstrated the employed gesture technology has significant potential for enabling bidirectional Soldier-robot team dialogue based on the high classification accuracy and minimal training required to perform gesture commands.
Accommodation and Compliance Series: Employees with Arthritis
... handed keyboard, an articulating keyboard tray, speech recognition software, a trackball, and office equipment for a workstation ... space heater, additional window insulation, and speech recognition software. An insurance clerk with arthritis from systemic lupus ...
Simmons, Elizabeth Schoen; Paul, Rhea; Shic, Frederick
2016-01-01
This study examined the acceptability of a mobile application, SpeechPrompts, designed to treat prosodic disorders in children with ASD and other communication impairments. Ten speech-language pathologists (SLPs) in public schools and 40 of their students, 5-19 years with prosody deficits participated. Students received treatment with the software over eight weeks. Pre- and post-treatment speech samples and student engagement data were collected. Feedback on the utility of the software was also obtained. SLPs implemented the software with their students in an authentic education setting. Student engagement ratings indicated students' attention to the software was maintained during treatment. Although more testing is warranted, post-treatment prosody ratings suggest that SpeechPrompts has potential to be a useful tool in the treatment of prosodic disorders.
Objective measurement of motor speech characteristics in the healthy pediatric population.
Wong, A W; Allegro, J; Tirado, Y; Chadha, N; Campisi, P
2011-12-01
To obtain objective measurements of motor speech characteristics in normal children, using a computer-based motor speech software program. Cross-sectional, observational design in a university-based ambulatory pediatric otolaryngology clinic. Participants included 112 subjects (54 females and 58 males) aged 4-18 years. Participants with previously diagnosed hearing loss, voice and motor disorders, and children unable to repeat a passage in English were excluded. Voice samples were recorded and analysed using the Motor Speech Profile (MSP) software (KayPENTAX, Lincoln Park, NJ). The MSP produced measures of diadochokinetics, second formant transition, intonation, and syllabic rates. Demographic data, including sex, age, and cigarette smoke exposure were obtained. Normative data for several motor speech characteristics were derived for children ranging from age 4 to 18 years. A number of age-dependent changes were indentified, including an increase in average diadochokinetic rate (p<0.001) and standard syllabic duration (p<0.001) with age. There were no identified differences in motor speech characteristics between males and females across the measured age range. Variations in fundamental frequency (Fo) during speech did not change significantly with age for both males and females. To our knowledge, this is the first pediatric normative database for the MSP progam. The MSP is suitable for testing children and can be used to study developmental changes in motor speech. The analysis demonstrated that males and females behave similarly and show the same relationship with age for the motor speech characteristics studied. This normative database will provide essential comparative data for future studies exploring alterations in motor speech that may occur with hearing, voice, and motor disorders and to assess the results of targeted therapies. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan
2017-02-01
For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.
[Acoustic voice analysis using the Praat program: comparative study with the Dr. Speech program].
Núñez Batalla, Faustino; González Márquez, Rocío; Peláez González, M Belén; González Laborda, Irene; Fernández Fernández, María; Morato Galán, Marta
2014-01-01
The European Laryngological Society (ELS) basic protocol for functional assessment of voice pathology includes 5 different approaches: perception, videostroboscopy, acoustics, aerodynamics and subjective rating by the patient. In this study we focused on acoustic voice analysis. The purpose of the present study was to correlate the results obtained by the commercial software Dr. Speech and the free software Praat in 2 fields: 1. Narrow-band spectrogram (the presence of noise according to Yanagihara, and the presence of subharmonics) (semi-quantitative). 2. Voice acoustic parameters (jitter, shimmer, harmonics-to-noise ratio, fundamental frequency) (quantitative). We studied a total of 99 voice samples from individuals with Reinke's oedema diagnosed using videostroboscopy. One independent observer used Dr. Speech 3.0 and a second one used the Praat program (Phonetic Sciences, University of Amsterdam). The spectrographic analysis consisted of obtaining a narrow-band spectrogram from the previous digitalised voice samples by the 2 independent observers. They then determined the presence of noise in the spectrogram, using the Yanagihara grades, as well as the presence of subharmonics. As a final result, the acoustic parameters of jitter, shimmer, harmonics-to-noise ratio and fundamental frequency were obtained from the 2 acoustic analysis programs. The results indicated that the sound spectrogram and the numerical values obtained for shimmer and jitter were similar for both computer programs, even though types 1, 2 and 3 voice samples were analysed. The Praat and Dr. Speech programs provide similar results in the acoustic analysis of pathological voices. Copyright © 2013 Elsevier España, S.L. All rights reserved.
"Look What I Did!": Student Conferences with Text-to-Speech Software
ERIC Educational Resources Information Center
Young, Chase; Stover, Katie
2014-01-01
The authors describe a strategy that empowers students to edit and revise their own writing. Students input their writing in to text-to-speech software that rereads the text aloud. While listening, students make necessary revisions and edits.
Conquering Language Babel in the Classroom
ERIC Educational Resources Information Center
Minichino, Mario; Berson, Michael J.
2012-01-01
This article is an exploration of the available applications for speech to speech real-time translation software for use in the classroom. Three different types of machine language translation (MLT) software and devices are reviewed for their features and practical application in secondary education classrooms.
Multimodal Speech Capture System for Speech Rehabilitation and Learning.
Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam
2017-11-01
Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.
NASA Astrophysics Data System (ADS)
Jelinek, H. J.
1986-01-01
This is the Final Report of Electronic Design Associates on its Phase I SBIR project. The purpose of this project is to develop a method for correcting helium speech, as experienced in diver-surface communication. The goal of the Phase I study was to design, prototype, and evaluate a real time helium speech corrector system based upon digital signal processing techniques. The general approach was to develop hardware (an IBM PC board) to digitize helium speech and software (a LAMBDA computer based simulation) to translate the speech. As planned in the study proposal, this initial prototype may now be used to assess expected performance from a self contained real time system which uses an identical algorithm. The Final Report details the work carried out to produce the prototype system. Four major project tasks were: a signal processing scheme for converting helium speech to normal sounding speech was generated. The signal processing scheme was simulated on a general purpose (LAMDA) computer. Actual helium speech was supplied to the simulation and the converted speech was generated. An IBM-PC based 14 bit data Input/Output board was designed and built. A bibliography of references on speech processing was generated.
Comparison of voice-automated transcription and human transcription in generating pathology reports.
Al-Aynati, Maamoun M; Chorneyko, Katherine A
2003-06-01
Software that can convert spoken words into written text has been available since the early 1980s. Early continuous speech systems were developed in 1994, with the latest commercially available editions having a claimed accuracy of up to 98% of speech recognition at natural speech rates. To evaluate the efficacy of one commercially available voice-recognition software system with pathology vocabulary in generating pathology reports and to compare this with human transcription. To draw cost analysis conclusions regarding human versus computer-based transcription. Two hundred six routine pathology reports from the surgical pathology material handled at St Joseph's Healthcare, Hamilton, Ontario, were generated simultaneously using computer-based transcription and human transcription. The following hardware and software were used: a desktop 450-MHz Intel Pentium III processor with 192 MB of RAM, a speech-quality sound card (Sound Blaster), noise-canceling headset microphone, and IBM ViaVoice Pro version 8 with pathology vocabulary support (Voice Automated, Huntington Beach, Calif). The cost of the hardware and software used was approximately Can 2250 dollars. A total of 23 458 words were transcribed using both methods with a mean of 114 words per report. The mean accuracy rate was 93.6% (range, 87.4%-96%) using the computer software, compared to a mean accuracy of 99.6% (range, 99.4%-99.8%) for human transcription (P <.001). Time needed to edit documents by the primary evaluator (M.A.) using the computer was on average twice that needed for editing the documents produced by human transcriptionists (range, 1.4-3.5 times). The extra time needed to edit documents was 67 minutes per week (13 minutes per day). Computer-based continuous speech-recognition systems in pathology can be successfully used in pathology practice even during the handling of gross pathology specimens. The relatively low accuracy rate of this voice-recognition software with resultant increased editing burden on pathologists may not encourage its application on a wide scale in pathology departments with sufficient human transcription services, despite significant potential financial savings. However, computer-based transcription represents an attractive and relatively inexpensive alternative to human transcription in departments where there is a shortage of transcription services, and will no doubt become more commonly used in pathology departments in the future.
A software tool for analyzing multichannel cochlear implant signals.
Lai, Wai Kong; Bögli, Hans; Dillier, Norbert
2003-10-01
A useful and convenient means to analyze the radio frequency (RF) signals being sent by a speech processor to a cochlear implant would be to actually capture and display them with appropriate software. This is particularly useful for development or diagnostic purposes. sCILab (Swiss Cochlear Implant Laboratory) is such a PC-based software tool intended for the Nucleus family of Multichannel Cochlear Implants. Its graphical user interface provides a convenient and intuitive means for visualizing and analyzing the signals encoding speech information. Both numerical and graphic displays are available for detailed examination of the captured CI signals, as well as an acoustic simulation of these CI signals. sCILab has been used in the design and verification of new speech coding strategies, and has also been applied as an analytical tool in studies of how different parameter settings of existing speech coding strategies affect speech perception. As a diagnostic tool, it is also useful for troubleshooting problems with the external equipment of the cochlear implant systems.
Integrating hidden Markov model and PRAAT: a toolbox for robust automatic speech transcription
NASA Astrophysics Data System (ADS)
Kabir, A.; Barker, J.; Giurgiu, M.
2010-09-01
An automatic time-aligned phone transcription toolbox of English speech corpora has been developed. Especially the toolbox would be very useful to generate robust automatic transcription and able to produce phone level transcription using speaker independent models as well as speaker dependent models without manual intervention. The system is based on standard Hidden Markov Models (HMM) approach and it was successfully experimented over a large audiovisual speech corpus namely GRID corpus. One of the most powerful features of the toolbox is the increased flexibility in speech processing where the speech community would be able to import the automatic transcription generated by HMM Toolkit (HTK) into a popular transcription software, PRAAT, and vice-versa. The toolbox has been evaluated through statistical analysis on GRID data which shows that automatic transcription deviates by an average of 20 ms with respect to manual transcription.
Investigation of habitual pitch during free play activities for preschool-aged children.
Chen, Yang; Kimelman, Mikael D Z; Micco, Katie
2009-01-01
This study is designed to compare the habitual pitch measured in two different speech activities (free play activity and traditionally used structured speech activity) for normally developing preschool-aged children to explore to what extent preschoolers vary their vocal pitch among different speech environments. Habitual pitch measurements were conducted for 10 normally developing children (2 boys, 8 girls) between the ages of 31 months and 71 months during two different activities: (1) free play; and (2) structured speech. Speech samples were recorded using a throat microphone connected with a wireless transmitter in both activities. The habitual pitch (in Hz) was measured for all collected speech samples by using voice analysis software (Real-Time Pitch). Significantly higher habitual pitch is found during free play in contrast to structured speech activities. In addition, there is no showing of significant difference of habitual pitch elicited across a variety of structured speech activities. Findings suggest that the vocal usage of preschoolers appears to be more effortful during free play than during structured activities. It is recommended that a comprehensive evaluation for young children's voice needs to be based on the speech/voice samples collected from both free play and structured activities.
Intelligent interfaces for expert systems
NASA Technical Reports Server (NTRS)
Villarreal, James A.; Wang, Lui
1988-01-01
Vital to the success of an expert system is an interface to the user which performs intelligently. A generic intelligent interface is being developed for expert systems. This intelligent interface was developed around the in-house developed Expert System for the Flight Analysis System (ESFAS). The Flight Analysis System (FAS) is comprised of 84 configuration controlled FORTRAN subroutines that are used in the preflight analysis of the space shuttle. In order to use FAS proficiently, a person must be knowledgeable in the areas of flight mechanics, the procedures involved in deploying a certain payload, and an overall understanding of the FAS. ESFAS, still in its developmental stage, is taking into account much of this knowledge. The generic intelligent interface involves the integration of a speech recognizer and synthesizer, a preparser, and a natural language parser to ESFAS. The speech recognizer being used is capable of recognizing 1000 words of connected speech. The natural language parser is a commercial software package which uses caseframe instantiation in processing the streams of words from the speech recognizer or the keyboard. The systems configuration is described along with capabilities and drawbacks.
Automatic Speech Recognition: Reliability and Pedagogical Implications for Teaching Pronunciation
ERIC Educational Resources Information Center
Kim, In-Seok
2006-01-01
This study examines the reliability of automatic speech recognition (ASR) software used to teach English pronunciation, focusing on one particular piece of software, "FluSpeak, as a typical example." Thirty-six Korean English as a Foreign Language (EFL) college students participated in an experiment in which they listened to 15 sentences…
Development of a speech autocuer
NASA Astrophysics Data System (ADS)
Bedles, R. L.; Kizakvich, P. N.; Lawson, D. T.; McCartney, M. L.
1980-12-01
A wearable, visually based prosthesis for the deaf based upon the proven method for removing lipreading ambiguity known as cued speech was fabricated and tested. Both software and hardware developments are described, including a microcomputer, display, and speech preprocessor.
Development of a speech autocuer
NASA Technical Reports Server (NTRS)
Bedles, R. L.; Kizakvich, P. N.; Lawson, D. T.; Mccartney, M. L.
1980-01-01
A wearable, visually based prosthesis for the deaf based upon the proven method for removing lipreading ambiguity known as cued speech was fabricated and tested. Both software and hardware developments are described, including a microcomputer, display, and speech preprocessor.
Niijima, H; Ito, N; Ogino, S; Takatori, T; Iwase, H; Kobayashi, M
2000-11-01
For the purpose of practical use of speech recognition technology for recording of forensic autopsy, a language model of the speech recording system, specialized for the forensic autopsy, was developed. The language model for the forensic autopsy by applying 3-gram model was created, and an acoustic model for Japanese speech recognition by Hidden Markov Model in addition to the above were utilized to customize the speech recognition engine for forensic autopsy. A forensic vocabulary set of over 10,000 words was compiled and some 300,000 sentence patterns were made to create the forensic language model, then properly mixing with a general language model to attain high exactitude. When tried by dictating autopsy findings, this speech recognition system was proved to be about 95% of recognition rate that seems to have reached to the practical usability in view of speech recognition software, though there remains rooms for improving its hardware and application-layer software.
The Suitability of Cloud-Based Speech Recognition Engines for Language Learning
ERIC Educational Resources Information Center
Daniels, Paul; Iwago, Koji
2017-01-01
As online automatic speech recognition (ASR) engines become more accurate and more widely implemented with call software, it becomes important to evaluate the effectiveness and the accuracy of these recognition engines using authentic speech samples. This study investigates two of the most prominent cloud-based speech recognition engines--Apple's…
A Talking Computers System for Persons with Vision and Speech Handicaps. Final Report.
ERIC Educational Resources Information Center
Visek & Maggs, Urbana, IL.
This final report contains a detailed description of six software systems designed to assist individuals with blindness and/or speech disorders in using inexpensive, off-the-shelf computers rather than expensive custom-made devices. The developed software is not written in the native machine language of any particular brand of computer, but in the…
Integrating Text-to-Speech Software into Pedagogically Sound Teaching and Learning Scenarios
ERIC Educational Resources Information Center
Rughooputh, S. D. D. V.; Santally, M. I.
2009-01-01
This paper presents a new technique of delivery of classes--an instructional technique which will no doubt revolutionize the teaching and learning, whether for on-campus, blended or online modules. This is based on the simple task of instructionally incorporating text-to-speech software embedded in the lecture slides that will simulate exactly the…
ERIC Educational Resources Information Center
Franco, Horacio; Bratt, Harry; Rossier, Romain; Rao Gadde, Venkata; Shriberg, Elizabeth; Abrash, Victor; Precoda, Kristin
2010-01-01
SRI International's EduSpeak[R] system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology. Automatic pronunciation scoring allows the computer to provide feedback on the overall quality of pronunciation and to point to…
Acoustic analysis of speech under stress.
Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish
2015-01-01
When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation.
Speech rehabilitation of maxillectomy patients with hollow bulb obturator.
Kumar, Pravesh; Jain, Veena; Thakar, Alok
2012-09-01
To evaluate the effect of hollow bulb obturator prosthesis on articulation and nasalance in maxillectomy patients. A total of 10 patients, who were to undergo maxillectomy, falling under Aramany classes I and II, with normal speech and hearing pattern were selected for the study. They were provided with definitive maxillary obturators after complete healing of the defect. The patients were asked to wear the obturator for six weeks and speech analysis was done to measure changes in articulation and nasalance at four different stages of treatment, namely, preoperative, postoperative (after complete healing, that is, 3-4 months after surgery), after 24 hours, and after six weeks of providing the obturators. Articulation was measured objectively for distortion, addition, substitution, and omission by a speech pathologist, and nasalance was measured by Dr. Speech software. The statistical comparison of preoperative and six weeks post rehabilitation levels showed insignificance in articulation and nasalance. Comparison of post surgery complete healing with six weeks after rehabilitation showed significant differences in both nasalance and articulation. Providing an obturator improves the speech closer to presurgical levels of articulation and there is improvement in nasality also.
The NTID speech recognition test: NSRT(®).
Bochner, Joseph H; Garrison, Wayne M; Doherty, Karen A
2015-07-01
The purpose of this study was to collect and analyse data necessary for expansion of the NSRT item pool and to evaluate the NSRT adaptive testing software. Participants were administered pure-tone and speech recognition tests including W-22 and QuickSIN, as well as a set of 323 new NSRT items and NSRT adaptive tests in quiet and background noise. Performance on the adaptive tests was compared to pure-tone thresholds and performance on other speech recognition measures. The 323 new items were subjected to Rasch scaling analysis. Seventy adults with mild to moderately severe hearing loss participated in this study. Their mean age was 62.4 years (sd = 20.8). The 323 new NSRT items fit very well with the original item bank, enabling the item pool to be more than doubled in size. Data indicate high reliability coefficients for the NSRT and moderate correlations with pure-tone thresholds (PTA and HFPTA) and other speech recognition measures (W-22, QuickSIN, and SRT). The adaptive NSRT is an efficient and effective measure of speech recognition, providing valid and reliable information concerning respondents' speech perception abilities.
Orthographic Learning and the Role of Text-to-Speech Software in Dutch Disabled Readers
ERIC Educational Resources Information Center
Staels, Eva; Van den Broeck, Wim
2015-01-01
In this study, we examined whether orthographic learning can be demonstrated in disabled readers learning to read in a transparent orthography (Dutch). In addition, we tested the effect of the use of text-to-speech software, a new form of direct instruction, on orthographic learning. Both research goals were investigated by replicating Share's…
ERIC Educational Resources Information Center
Coleman, Mari Beth; Killdare, Laura K.; Bell, Sherry Mee; Carter, Amanda M.
2014-01-01
The purpose of this study was to determine the impact of text-to-speech software on reading fluency and comprehension for four postsecondary students with below average reading fluency and comprehension including three students diagnosed with learning disabilities and concomitant conditions (e.g., attention deficit hyperactivity disorder, seizure…
A multilingual audiometer simulator software for training purposes.
Kompis, Martin; Steffen, Pascal; Caversaccio, Marco; Brugger, Urs; Oesch, Ivo
2012-04-01
A set of algorithms, which allows a computer to determine the answers of simulated patients during pure tone and speech audiometry, is presented. Based on these algorithms, a computer program for training in audiometry was written and found to be useful for teaching purposes. To develop a flexible audiometer simulator software as a teaching and training tool for pure tone and speech audiometry, both with and without masking. First a set of algorithms, which allows a computer to determine the answers of a simulated, hearing-impaired patient, was developed. Then, the software was implemented. Extensive use was made of simple, editable text files to define all texts in the user interface and all patient definitions. The software 'audiometer simulator' is available for free download. It can be used to train pure tone audiometry (both with and without masking), speech audiometry, measurement of the uncomfortable level, and simple simulation tests. Due to the use of text files, the user can alter or add patient definitions and all texts and labels shown on the screen. So far, English, French, German, and Portuguese user interfaces are available and the user can choose between German or French speech audiometry.
Influence of Smartphones and Software on Acoustic Voice Measures
GRILLO, ELIZABETH U.; BROSIOUS, JENNA N.; SORRELL, STACI L.; ANAND, SUPRAJA
2016-01-01
This study assessed the within-subject variability of voice measures captured using different recording devices (i.e., smartphones and head mounted microphone) and software programs (i.e., Analysis of Dysphonia in Speech and Voice (ADSV), Multi-dimensional Voice Program (MDVP), and Praat). Correlations between the software programs that calculated the voice measures were also analyzed. Results demonstrated no significant within-subject variability across devices and software and that some of the measures were highly correlated across software programs. The study suggests that certain smartphones may be appropriate to record daily voice measures representing the effects of vocal loading within individuals. In addition, even though different algorithms are used to compute voice measures across software programs, some of the programs and measures share a similar relationship. PMID:28775797
Perceptual, auditory and acoustic vocal analysis of speech and singing in choir conductors.
Rehder, Maria Inês Beltrati Cornacchioni; Behlau, Mara
2008-01-01
the voice of choir conductors. to evaluate the vocal quality of choir conductors based on the production of a sustained vowel during singing and when speaking in order to observe auditory and acoustic differences. participants of this study were 100 choir conductors, with an equal distribution between genders. Participants were asked to produce the sustained vowel "é" using a singing and speaking voice. Speech samples were analyzed based on auditory-perceptive and acoustic parameters. The auditory-perceptive analysis was carried out by two speech-language pathologist, specialists in this field of knowledge. The acoustic analysis was carried out with the support of the computer software Doctor Speech (Tiger Electronics, SRD, USA, version 4.0), using the Real Analysis module. the auditory-perceptive analysis of the vocal quality indicated that most conductors have adapted voices, presenting more alterations in their speaking voice. The acoustic analysis indicated different values between genders and between the different production modalities. The fundamental frequency was higher in the singing voice, as well as the values for the first formant; the second formant presented lower values in the singing voice, with statistically significant results only for women. the voice of choir conductors is adapted, presenting fewer deviations in the singing voice when compared to the speaking voice. Productions differ based the voice modality, singing or speaking.
What does voice-processing technology support today?
Nakatsu, R; Suzuki, Y
1995-01-01
This paper describes the state of the art in applications of voice-processing technologies. In the first part, technologies concerning the implementation of speech recognition and synthesis algorithms are described. Hardware technologies such as microprocessors and DSPs (digital signal processors) are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described. In the second part, the state of the art of algorithms from the standpoint of applications is discussed. Several issues concerning evaluation of speech recognition/synthesis algorithms are covered, as well as issues concerning the robustness of algorithms in adverse conditions. Images Fig. 3 PMID:7479720
ERIC Educational Resources Information Center
Baker, Fiona S.
2015-01-01
This study explores the expectations and early and subsequent realities of text-to-speech software for 24 nonnative-English-speaking college students who were experiencing reading difficulties in their freshman year of college. The study took place over two semesters in one academic year (from September to June) at a community college on the…
Intra-oral pressure-based voicing control of electrolaryngeal speech with intra-oral vibrator.
Takahashi, Hirokazu; Nakao, Masayuki; Kikuchi, Yataro; Kaga, Kimitaka
2008-07-01
In normal speech, coordinated activities of intrinsic laryngeal muscles suspend a glottal sound at utterance of voiceless consonants, automatically realizing a voicing control. In electrolaryngeal speech, however, the lack of voicing control is one of the causes of unclear voice, voiceless consonants tending to be misheard as the corresponding voiced consonants. In the present work, we developed an intra-oral vibrator with an intra-oral pressure sensor that detected utterance of voiceless phonemes during the intra-oral electrolaryngeal speech, and demonstrated that an intra-oral pressure-based voicing control could improve the intelligibility of the speech. The test voices were obtained from one electrolaryngeal speaker and one normal speaker. We first investigated on the speech analysis software how a voice onset time (VOT) and first formant (F1) transition of the test consonant-vowel syllables contributed to voiceless/voiced contrasts, and developed an adequate voicing control strategy. We then compared the intelligibility of consonant-vowel syllables among the intra-oral electrolaryngeal speech with and without online voicing control. The increase of intra-oral pressure, typically with a peak ranging from 10 to 50 gf/cm2, could reliably identify utterance of voiceless consonants. The speech analysis and intelligibility test then demonstrated that a short VOT caused the misidentification of the voiced consonants due to a clear F1 transition. Finally, taking these results together, the online voicing control, which suspended the prosthetic tone while the intra-oral pressure exceeded 2.5 gf/cm2 and during the 35 milliseconds that followed, proved efficient to improve the voiceless/voiced contrast.
Acoustic Analysis of Speech of Cochlear Implantees and Its Implications
Patadia, Rajesh; Govale, Prajakta; Rangasayee, R.; Kirtane, Milind
2012-01-01
Objectives Cochlear implantees have improved speech production skills compared with those using hearing aids, as reflected in their acoustic measures. When compared to normal hearing controls, implanted children had fronted vowel space and their /s/ and /∫/ noise frequencies overlapped. Acoustic analysis of speech provides an objective index of perceived differences in speech production which can be precursory in planning therapy. The objective of this study was to compare acoustic characteristics of speech in cochlear implantees with those of normal hearing age matched peers to understand implications. Methods Group 1 consisted of 15 children with prelingual bilateral severe-profound hearing loss (age, 5-11 years; implanted between 4-10 years). Prior to an implant behind the ear, hearing aids were used; prior & post implantation subjects received at least 1 year of aural intervention. Group 2 consisted of 15 normal hearing age matched peers. Sustained productions of vowels and words with selected consonants were recorded. Using Praat software for acoustic analysis, digitized speech tokens were measured for F1, F2, and F3 of vowels; centre frequency (Hz) and energy concentration (dB) in burst; voice onset time (VOT in ms) for stops; centre frequency (Hz) of noise in /s/; rise time (ms) for affricates. A t-test was used to find significant differences between groups. Results Significant differences were found in VOT for /b/, F1 and F2 of /e/, and F3 of /u/. No significant differences were found for centre frequency of burst, energy concentration for stops, centre frequency of noise in /s/, or rise time for affricates. These findings suggest that auditory feedback provided by cochlear implants enable subjects to monitor production of speech sounds. Conclusion Acoustic analysis of speech is an essential method for discerning characteristics which have or have not been improved by cochlear implantation and thus for planning intervention. PMID:22701768
Visual-Auditory Integration during Speech Imitation in Autism
ERIC Educational Resources Information Center
Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas
2004-01-01
Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…
Helium Speech: An Application of Standing Waves
NASA Astrophysics Data System (ADS)
Wentworth, Christopher D.
2011-04-01
Taking a breath of helium gas and then speaking or singing to the class is a favorite demonstration for an introductory physics course, as it usually elicits appreciative laughter, which serves to energize the class session. Students will usually report that the helium speech "raises the frequency" of the voice. A more accurate description of the phenomenon requires that we distinguish between the frequencies of sound produced by the larynx and the filtering of those frequencies by the vocal tract. We will describe here an experiment done by introductory physics students that uses helium speech as a context for learning about the human vocal system and as an application of the standing sound-wave concept. Modern acoustic analysis software easily obtained by instructors for student use allows data to be obtained and analyzed quickly.
Valadez, Victor; Ysunza, Antonio; Ocharan-Hernandez, Esther; Garrido-Bustamante, Norma; Sanchez-Valerio, Araceli; Pamplona, Ma C
2012-09-01
Vocal Nodules (VN) are a functional voice disorder associated with voice misuse and abuse in children. There are few reports addressing vocal parameters in children with VN, especially after a period of vocal rehabilitation. The purpose of this study is to describe measurements of vocal parameters including Fundamental Frequency (FF), Shimmer (S), and Jitter (J), videonasolaryngoscopy examination and clinical perceptual assessment, before and after voice therapy in children with VN. Voice therapy was provided using visual support through Speech-Viewer software. Twenty patients with VN were studied. An acoustical analysis of voice was performed and compared with data from subjects from a control group matched by age and gender. Also, clinical perceptual assessment of voice and videonasolaryngoscopy were performed to all patients with VN. After a period of voice therapy, provided with visual support using Speech Viewer-III (SV-III-IBM) software, new acoustical analyses, perceptual assessments and videonasolaryngoscopies were performed. Before the onset of voice therapy, there was a significant difference (p<0.05) in mean FF, S and J, between the patients with VN and subjects from the control group. After the voice therapy period, a significant improvement (p<0.05) was found in all acoustic voice parameters. Moreover, perceptual voice analysis demonstrated improvement in all cases. Finally, videonasolaryngoscopy demonstrated that vocal nodules were no longer discernible on the vocal folds in any of the cases. SV-III software seems to be a safe and reliable method for providing voice therapy in children with VN. Acoustic voice parameters, perceptual data and videonasolaryngoscopy were significantly improved after the speech therapy period was completed. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Implications of diadochokinesia in children with speech sound disorder.
Wertzner, Haydée Fiszbein; Pagan-Neves, Luciana de Oliveira; Alves, Renata Ramos; Barrozo, Tatiane Faria
2013-01-01
To verify the performance of children with and without speech sound disorder in oral motor skills measured by oral diadochokinesia according to age and gender and to compare the results by two different methods of analysis. Participants were 72 subjects aged from 5 years to 7 years and 11 months divided into four subgroups according to the presence of speech sound disorder (Study Group and Control Group) and age (<6 years and 5 months and >6 years and 5 months). Diadochokinesia skills were assessed by the repetition of the sequences 'pa', 'ta', 'ka' and 'pataka' measured both manually and by the software Motor Speech Profile®. Gender was statistically different for both groups but it did not influence on the number of sequences per second produced. Correlation between the number of sequences per second and age was observed for all sequences (except for 'ka') only for the control group children. Comparison between groups did not indicate differences between the number of sequences per second and age. Results presented strong agreement between the values of oral diadochokinesia measured manually and by MSP. This research demonstrated the importance of using different methods of analysis on the functional evaluation of oro-motor processing aspects of children with speech sound disorder and evidenced the oro-motor difficulties on children aged under than eight years old.
Research on Spoken Dialogue Systems
NASA Technical Reports Server (NTRS)
Aist, Gregory; Hieronymus, James; Dowding, John; Hockey, Beth Ann; Rayner, Manny; Chatzichrisafis, Nikos; Farrell, Kim; Renders, Jean-Michel
2010-01-01
Research in the field of spoken dialogue systems has been performed with the goal of making such systems more robust and easier to use in demanding situations. The term "spoken dialogue systems" signifies unified software systems containing speech-recognition, speech-synthesis, dialogue management, and ancillary components that enable human users to communicate, using natural spoken language or nearly natural prescribed spoken language, with other software systems that provide information and/or services.
Speech Rehabilitation of Maxillectomy Patients with Hollow Bulb Obturator
Kumar, Pravesh; Jain, Veena; Thakar, Alok
2012-01-01
Aim: To evaluate the effect of hollow bulb obturator prosthesis on articulation and nasalance in maxillectomy patients. Materials and Methods: A total of 10 patients, who were to undergo maxillectomy, falling under Aramany classes I and II, with normal speech and hearing pattern were selected for the study. They were provided with definitive maxillary obturators after complete healing of the defect. The patients were asked to wear the obturator for six weeks and speech analysis was done to measure changes in articulation and nasalance at four different stages of treatment, namely, preoperative, postoperative (after complete healing, that is, 3-4 months after surgery), after 24 hours, and after six weeks of providing the obturators. Articulation was measured objectively for distortion, addition, substitution, and omission by a speech pathologist, and nasalance was measured by Dr. Speech software. Results: The statistical comparison of preoperative and six weeks post rehabilitation levels showed insignificance in articulation and nasalance. Comparison of post surgery complete healing with six weeks after rehabilitation showed significant differences in both nasalance and articulation. Conclusion: Providing an obturator improves the speech closer to presurgical levels of articulation and there is improvement in nasality also. PMID:23440022
NASA Technical Reports Server (NTRS)
Kumar, P.; Lin, F. Y.; Vaishampayan, V.; Farvardin, N.
1986-01-01
A complete documentation of the software developed in the Communication and Signal Processing Laboratory (CSPL) during the period of July 1985 to March 1986 is provided. Utility programs and subroutines that were developed for a user-friendly image and speech processing environment are described. Additional programs for data compression of image and speech type signals are included. Also, programs for the zero-memory and block transform quantization in the presence of channel noise are described. Finally, several routines for simulating the perfromance of image compression algorithms are included.
Executives' speech expressiveness: analysis of perceptive and acoustic aspects of vocal dynamics.
Marquezin, Daniela Maria Santos Serrano; Viola, Izabel; Ghirardi, Ana Carolina de Assis Moura; Madureira, Sandra; Ferreira, Léslie Piccolotto
2015-01-01
To analyze speech expressiveness in a group of executives based on perceptive and acoustic aspects of vocal dynamics. Four male subjects participated in the research study (S1, S2, S3, and S4). The assessments included the Kingdomality test to obtain the keywords of communicative attitudes; perceptive-auditory assessment to characterize vocal quality and dynamics, performed by three judges who are speech language pathologists; perceptiveauditory assessment to judge the chosen keywords; speech acoustics to assess prosodic elements (Praat software); and a statistical analysis. According to the perceptive-auditory analysis of vocal dynamics, S1, S2, S3, and S4 did not show vocal alterations and all of them were considered with lowered habitual pitch. S1: pointed out as insecure, nonobjective, nonempathetic, and unconvincing with inappropriate use of pauses that are mainly formed by hesitations; inadequate separation of prosodic groups with breaking of syntagmatic constituents. S2: regular use of pauses for respiratory reload, organization of sentences, and emphasis, which is considered secure, little objective, empathetic, and convincing. S3: pointed out as secure, objective, empathetic, and convincing with regular use of pauses for respiratory reload and organization of sentences and hesitations. S4: the most secure, objective, empathetic, and convincing, with proper use of pauses for respiratory reload, planning, and emphasis; prosodic groups agreed with the statement, without separating the syntagmatic constituents. The speech characteristics and communicative attitudes were highlighted in two subjects in a different manner, in such a way that the slow rate of speech and breaks of the prosodic groups transmitted insecurity, little objectivity, and nonpersuasion.
Using Web Speech Technology with Language Learning Applications
ERIC Educational Resources Information Center
Daniels, Paul
2015-01-01
In this article, the author presents the history of human-to-computer interaction based upon the design of sophisticated computerized speech recognition algorithms. Advancements such as the arrival of cloud-based computing and software like Google's Web Speech API allows anyone with an Internet connection and Chrome browser to take advantage of…
Designing a Humane Multimedia Interface for the Visually Impaired.
ERIC Educational Resources Information Center
Ghaoui, Claude; Mann, M.; Ng, Eng Huat
2001-01-01
Promotes the provision of interfaces that allow users to access most of the functionality of existing graphical user interfaces (GUI) using speech. Uses the design of a speech control tool that incorporates speech recognition and synthesis into existing packaged software such as Teletext, the Internet, or a word processor. (Contains 22…
Sauder, Cara; Bretl, Michelle; Eadie, Tanya
2017-09-01
The purposes of this study were to (1) determine and compare the diagnostic accuracy of a single acoustic measure, smoothed cepstral peak prominence (CPPS), to predict voice disorder status from connected speech samples using two software systems: Analysis of Dysphonia in Speech and Voice (ADSV) and Praat; and (2) to determine the relationship between measures of CPPS generated from these programs. This is a retrospective cross-sectional study. Measures of CPPS were obtained from connected speech recordings of 100 subjects with voice disorders and 70 nondysphonic subjects without vocal complaints using commercially available ADSV and freely downloadable Praat software programs. Logistic regression and receiver operating characteristic (ROC) analyses were used to evaluate and compare the diagnostic accuracy of CPPS measures. Relationships between CPPS measures from the programs were determined. Results showed acceptable overall accuracy rates (75% accuracy, ADSV; 82% accuracy, Praat) and area under the ROC curves (area under the curve [AUC] = 0.81, ADSV; AUC = 0.91, Praat) for predicting voice disorder status, with slight differences in sensitivity and specificity. CPPS measures derived from Praat were uniquely predictive of disorder status above and beyond CPPS measures from ADSV (χ 2 (1) = 40.71, P < 0.001). CPPS measures from both programs were significantly and highly correlated (r = 0.88, P < 0.001). A single acoustic measure of CPPS was highly predictive of voice disorder status using either program. Clinicians may consider using CPPS to complement clinical voice evaluation and screening protocols. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Whole-exome sequencing supports genetic heterogeneity in childhood apraxia of speech.
Worthey, Elizabeth A; Raca, Gordana; Laffin, Jennifer J; Wilk, Brandon M; Harris, Jeremy M; Jakielski, Kathy J; Dimmock, David P; Strand, Edythe A; Shriberg, Lawrence D
2013-10-02
Childhood apraxia of speech (CAS) is a rare, severe, persistent pediatric motor speech disorder with associated deficits in sensorimotor, cognitive, language, learning and affective processes. Among other neurogenetic origins, CAS is the disorder segregating with a mutation in FOXP2 in a widely studied, multigenerational London family. We report the first whole-exome sequencing (WES) findings from a cohort of 10 unrelated participants, ages 3 to 19 years, with well-characterized CAS. As part of a larger study of children and youth with motor speech sound disorders, 32 participants were classified as positive for CAS on the basis of a behavioral classification marker using auditory-perceptual and acoustic methods that quantify the competence, precision and stability of a speaker's speech, prosody and voice. WES of 10 randomly selected participants was completed using the Illumina Genome Analyzer IIx Sequencing System. Image analysis, base calling, demultiplexing, read mapping, and variant calling were performed using Illumina software. Software developed in-house was used for variant annotation, prioritization and interpretation to identify those variants likely to be deleterious to neurodevelopmental substrates of speech-language development. Among potentially deleterious variants, clinically reportable findings of interest occurred on a total of five chromosomes (Chr3, Chr6, Chr7, Chr9 and Chr17), which included six genes either strongly associated with CAS (FOXP1 and CNTNAP2) or associated with disorders with phenotypes overlapping CAS (ATP13A4, CNTNAP1, KIAA0319 and SETX). A total of 8 (80%) of the 10 participants had clinically reportable variants in one or two of the six genes, with variants in ATP13A4, KIAA0319 and CNTNAP2 being the most prevalent. Similar to the results reported in emerging WES studies of other complex neurodevelopmental disorders, our findings from this first WES study of CAS are interpreted as support for heterogeneous genetic origins of this pediatric motor speech disorder with multiple genes, pathways and complex interactions. We also submit that our findings illustrate the potential use of WES for both gene identification and case-by-case clinical diagnostics in pediatric motor speech disorders.
Effects of text-to-speech software use on the reading proficiency of high school struggling readers.
Park, Hye Jin; Takahashi, Kiriko; Roberts, Kelly D; Delise, Danielle
2017-01-01
The literature highlights the benefits of text-to-speech (TTS) software when used as an assistive technology facilitating struggling readers' access to print. However, the effects of TTS software use, upon students' unassisted reading proficiency, have remained relatively unexplored. The researchers utilized an experimental design to investigate whether 9th grade struggling readers who use TTS software to read course materials demonstrate significant improvements in unassisted reading performance. A total of 164 students of 30 teachers in Hawaii participated in the study. Analyses of covariance results indicated that the TTS intervention had a significant, positive effect on student reading vocabulary and reading comprehension after 10 weeks of TTS software use (average 582 minutes). There are several limitations to the study; however, the current study opens up for discussions and need for further studies investigating TTS software as a viable reading intervention for adolescent struggling readers.
Speech and Voice Response to a Levodopa Challenge in Late-Stage Parkinson's Disease.
Fabbri, Margherita; Guimarães, Isabel; Cardoso, Rita; Coelho, Miguel; Guedes, Leonor Correia; Rosa, Mario M; Godinho, Catarina; Abreu, Daisy; Gonçalves, Nilza; Antonini, Angelo; Ferreira, Joaquim J
2017-01-01
Parkinson's disease (PD) patients are affected by hypokinetic dysarthria, characterized by hypophonia and dysprosody, which worsens with disease progression. Levodopa's (l-dopa) effect on quality of speech is inconclusive; no data are currently available for late-stage PD (LSPD). To assess the modifications of speech and voice in LSPD following an acute l-dopa challenge. LSPD patients [Schwab and England score <50/Hoehn and Yahr stage >3 (MED ON)] performed several vocal tasks before and after an acute l-dopa challenge. The following was assessed: respiratory support for speech, voice quality, stability and variability, speech rate, and motor performance (MDS-UPDRS-III). All voice samples were recorded and analyzed by a speech and language therapist blinded to patients' therapeutic condition using Praat 5.1 software. 24/27 (14 men) LSPD patients succeeded in performing voice tasks. Median age and disease duration of patients were 79 [IQR: 71.5-81.7] and 14.5 [IQR: 11-15.7] years, respectively. In MED OFF, respiratory breath support and pitch break time of LSPD patients were worse than the normative values of non-parkinsonian. A correlation was found between disease duration and voice quality ( R = 0.51; p = 0.013) and speech rate ( R = -0.55; p = 0.008). l-Dopa significantly improved MDS-UPDRS-III score (20%), with no effect on speech as assessed by clinical rating scales and automated analysis. Speech is severely affected in LSPD. Although l-dopa had some effect on motor performance, including axial signs, speech and voice did not improve. The applicability and efficacy of non-pharmacological treatment for speech impairment should be considered for speech disorder management in PD.
Orthographic learning and the role of text-to-speech software in Dutch disabled readers.
Staels, Eva; Van den Broeck, Wim
2015-01-01
In this study, we examined whether orthographic learning can be demonstrated in disabled readers learning to read in a transparent orthography (Dutch). In addition, we tested the effect of the use of text-to-speech software, a new form of direct instruction, on orthographic learning. Both research goals were investigated by replicating Share's self-teaching paradigm. A total of 65 disabled Dutch readers were asked to read eight stories containing embedded homophonic pseudoword targets (e.g., Blot/Blod), with or without the support of text-to-speech software. The amount of orthographic learning was assessed 3 or 7 days later by three measures of orthographic learning. First, the results supported the presence of orthographic learning during independent silent reading by demonstrating that target spellings were correctly identified more often, named more quickly, and spelled more accurately than their homophone foils. Our results support the hypothesis that all readers, even poor readers of transparent orthographies, are capable of developing word-specific knowledge. Second, a negative effect of text-to-speech software on orthographic learning was demonstrated in this study. This negative effect was interpreted as the consequence of passively listening to the auditory presentation of the text. We clarify how these results can be interpreted within current theoretical accounts of orthographic learning and briefly discuss implications for remedial interventions. © Hammill Institute on Disabilities 2013.
Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise
NASA Technical Reports Server (NTRS)
Huang, Yiteng Arden; Chen, Jingdong; Chen, Shaoyan Sharyl
2009-01-01
A new approach has been proposed for increasing astronaut comfort and speech capture. Currently, the special design of a spacesuit forms an extreme acoustic environment making it difficult to capture clear speech without compromising comfort. The proposed Integrated Spacesuit Audio (ISA) system is to incorporate the microphones into the helmet and use software to extract voice signals from background noise.
Vogel, Adam P; Block, Susan; Kefalianos, Elaina; Onslow, Mark; Eadie, Patricia; Barth, Ben; Conway, Laura; Mundt, James C; Reilly, Sheena
2015-04-01
To investigate the feasibility of adopting automated interactive voice response (IVR) technology for remotely capturing standardized speech samples from stuttering children. Participants were 10 6-year-old stuttering children. Their parents called a toll-free number from their homes and were prompted to elicit speech from their children using a standard protocol involving conversation, picture description and games. The automated IVR system was implemented using an off-the-shelf telephony software program and delivered by a standard desktop computer. The software infrastructure utilizes voice over internet protocol. Speech samples were automatically recorded during the calls. Video recordings were simultaneously acquired in the home at the time of the call to evaluate the fidelity of the telephone collected samples. Key outcome measures included syllables spoken, percentage of syllables stuttered and an overall rating of stuttering severity using a 10-point scale. Data revealed a high level of relative reliability in terms of intra-class correlation between the video and telephone acquired samples on all outcome measures during the conversation task. Findings were less consistent for speech samples during picture description and games. Results suggest that IVR technology can be used successfully to automate remote capture of child speech samples.
Advances in EPG for treatment and research: an illustrative case study.
Scobbie, James M; Wood, Sara E; Wrench, Alan A
2004-01-01
Electropalatography (EPG), a technique which reveals tongue-palate contact patterns over time, is a highly effective tool for speech research. We report here on recent developments by Articulate Instruments Ltd. These include hardware for Windows-based computers, backwardly compatible (with Reading EPG3) software systems for clinical intervention and laboratory-based analysis for EPG and acoustic data, and an enhanced clinical interface with client and file management tools. We focus here on a single case study of a child aged 10+/-years who had been diagnosed with an intractable speech disorder possibly resulting ultimately from a complete cleft of hard and soft palate. We illustrate how assessment, diagnosis and treatment of the intractable speech disorder are undertaken using this new generation of instrumental phonetic support. We also look forward to future developments in articulatory phonetics that will link EPG with ultrasound for research and clinical communities.
Syntactic and semantic errors in radiology reports associated with speech recognition software.
Ringler, Michael D; Goss, Brian C; Bartholmai, Brian J
2017-03-01
Speech recognition software can increase the frequency of errors in radiology reports, which may affect patient care. We retrieved 213,977 speech recognition software-generated reports from 147 different radiologists and proofread them for errors. Errors were classified as "material" if they were believed to alter interpretation of the report. "Immaterial" errors were subclassified as intrusion/omission or spelling errors. The proportion of errors and error type were compared among individual radiologists, imaging subspecialty, and time periods. In all, 20,759 reports (9.7%) contained errors, of which 3992 (1.9%) were material errors. Among immaterial errors, spelling errors were more common than intrusion/omission errors ( p < .001). Proportion of errors and fraction of material errors varied significantly among radiologists and between imaging subspecialties ( p < .001). Errors were more common in cross-sectional reports, reports reinterpreting results of outside examinations, and procedural studies (all p < .001). Error rate decreased over time ( p < .001), which suggests that a quality control program with regular feedback may reduce errors.
ERIC Educational Resources Information Center
Brancalioni, Ana Rita; Magnago, Karine Faverzani; Keske-Soares, Marcia
2012-01-01
The objective of this study is to create a new proposal for classifying the severity of speech disorders using a fuzzy model in accordance with a linguistic model that represents the speech acquisition of Brazilian Portuguese. The fuzzy linguistic model was run in the MATLAB software fuzzy toolbox from a set of fuzzy rules, and it encompassed…
Zakkula, Srujana; B, Sreedevi; Anne, Gopinadh; Manne, Prakash; Bindu O, Swetha Hima; Atla, Jyothi; Deepthi, Sneha; Chaitanya A, Krishna
2014-04-01
Prosthodontic treatment involves clinical procedures which influence speech performance directly or indirectly. Prosthetic rehabilitation of missing teeth with partial or complete maxillary removable dentures influences the individual voice characteristics like Phonation, resonance etc. To evaluate the effect of Acrylic palatal plate thickness (1mm-3mm) of maxillary prosthesis on phonation. Twelve subjects were selected randomly between the age group of 20-25 years who have full complement of teeth and have no speech problems. Speech evaluation was done under four experimental conditions i.e. Without any experimental acrylic palatal plate (control), with experimental acrylic palatal plates of thickness 1 mm, 2 mm and 3 mm respectively. The speech material for phonation test consisted of Vowels sounds /a/, /i/, and /o/. Speech analysis to assess phonation was done using digital acoustic analysis (PRAAT software). The obtained results were statistically analyzed by One-way ANOVA and Tukey's multiple post-hoc for comparison of four experimental conditions with respect to different vowel sounds. Mean harmonics to noise ratio (HNR) values obtained for all the Experimental conditions did not show significant difference (p>0.05). In conclusion, an increase in the thickness of the acrylic palatal plate of maxillary prosthesis for about 1 mm - 3mm in complete or partial maxillary removable dentures resulted in no significant effect on phonation of vowel sounds /a/, /i/ and /o/. Increasing the thickness of the palatal plate from 1 mm to 3 mm has not shown any significant effect on the phonation.
Assistive Software Tools for Secondary-Level Students with Literacy Difficulties
ERIC Educational Resources Information Center
Lange, Alissa A.; McPhillips, Martin; Mulhern, Gerry; Wylie, Judith
2006-01-01
The present study assessed the compensatory effectiveness of four assistive software tools (speech synthesis, spellchecker, homophone tool, and dictionary) on literacy. Secondary-level students (N = 93) with reading difficulties completed computer-based tests of literacy skills. Training on their respective software followed for those assigned to…
Legal Issues and Computer Use by School-Based Audiologists and Speech-Language Pathologists.
ERIC Educational Resources Information Center
Wynne, Michael K.; Hurst, David S.
1995-01-01
This article reviews ethical and legal issues regarding school-based integration and application of technologies, particularly when used by speech-language pathologists and audiologists. Four issues are addressed: (1) software copyright and licensed use; (2) information access and the right to privacy; (3) computer-assisted or…
Research on Speech Perception. Progress Report No. 4, January 1977-September 1978.
ERIC Educational Resources Information Center
Pisoni, David B.; And Others
Summarizing research activities from January 1977 to September 1978, this is the fourth annual report of research on speech processing conducted in the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, progress reports, and information on instrumentation developments and software support. The…
Communicating headings and preview sentences in text and speech.
Lorch, Robert F; Chen, Hung-Tao; Lemarié, Julie
2012-09-01
Two experiments tested the effects of preview sentences and headings on the quality of college students' outlines of informational texts. Experiment 1 found that performance was much better in the preview sentences condition than in a no-signals condition for both printed text and text-to-speech (TTS) audio rendering of the printed text. In contrast, performance in the headings condition was good for the printed text but poor for the auditory presentation because the TTS software failed to communicate nonverbal information carried by the visual headings. Experiment 2 compared outlining performance for five headings conditions during TTS presentation. Using a theoretical framework, "signaling available, relevant, accessible" (SARA) information, to provide an analysis of the information content of headings in the printed text, the manipulation of the headings systematically restored information that was omitted by the TTS application in Experiment 1. The result was that outlining performance improved to levels similar to the visual headings condition of Experiment 1. It is argued that SARA is a useful framework for guiding future development of TTS software for a wide variety of text signaling devices, not just headings.
Developing Computer Software for Use in the Speech/Comunications Classroom.
ERIC Educational Resources Information Center
Krauss, Beatrice J.
Appropriate software can turn the microcomputer from the dumb box into a teaching tool. One resource for finding appropriate software is the organization Edunet. It allows the user to access the mainframe of 18 major universities and has developed a communications network with 130 colleges. It also handles billing, does periodic software…
Why Free Software Matters for Literacy Educators.
ERIC Educational Resources Information Center
Brunelle, Michael D.; Bruce, Bertram C.
2002-01-01
Notes that understanding what "free software" means and its implications for access and use of new technologies is an important component of the new literacies. Concludes that if free speech and free press are essential to the development of a general literacy, then free software can promote the development of computer literacy. (SG)
An automatic speech recognition system with speaker-independent identification support
NASA Astrophysics Data System (ADS)
Caranica, Alexandru; Burileanu, Corneliu
2015-02-01
The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.
The Future of Software Engineering for High Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pope, G
DOE ASCR requested that from May through mid-July 2015 a study group identify issues and recommend solutions from a software engineering perspective transitioning into the next generation of High Performance Computing. The approach used was to ask some of the DOE complex experts who will be responsible for doing this work to contribute to the study group. The technique used was to solicit elevator speeches: a short and concise write up done as if the author was a speaker with only a few minutes to convince a decision maker of their top issues. Pages 2-18 contain the original texts ofmore » the contributed elevator speeches and end notes identifying the 20 contributors. The study group also ranked the importance of each topic, and those scores are displayed with each topic heading. A perfect score (and highest priority) is three, two is medium priority, and one is lowest priority. The highest scoring topic areas were software engineering and testing resources; the lowest scoring area was compliance to DOE standards. The following two paragraphs are an elevator speech summarizing the contributed elevator speeches. Each sentence or phrase in the summary is hyperlinked to its source via a numeral embedded in the text. A risk one liner has also been added to each topic to allow future risk tracking and mitigation.« less
NASA Astrophysics Data System (ADS)
Endah, S. N.; Nugraheni, D. M. K.; Adhy, S.; Sutikno
2017-04-01
According to Law No. 32 of 2002 and the Indonesian Broadcasting Commission Regulation No. 02/P/KPI/12/2009 & No. 03/P/KPI/12/2009, stated that broadcast programs should not scold with harsh words, not harass, insult or demean minorities and marginalized groups. However, there are no suitable tools to censor those words automatically. Therefore, researches to develop a system of intelligent software to censor the words automatically are needed. To conduct censor, the system must be able to recognize the words in question. This research proposes the classification of speech divide into two classes using Support Vector Machine (SVM), first class is set of rude words and the second class is set of properly words. The speech pitch values as an input in SVM, it used for the development of the system for the Indonesian rude swear word. The results of the experiment show that SVM is good for this system.
B, Sreedevi; Anne, Gopinadh; Manne, Prakash; Bindu O, Swetha Hima; Atla, Jyothi; Deepthi, Sneha; Chaitanya A, Krishna
2014-01-01
Background: Prosthodontic treatment involves clinical procedures which influence speech performance directly or indirectly. Prosthetic rehabilitation of missing teeth with partial or complete maxillary removable dentures influences the individual voice characteristics like Phonation, resonance etc. Aim: To evaluate the effect of Acrylic palatal plate thickness (1mm-3mm) of maxillary prosthesis on phonation. Materials and Methods: Twelve subjects were selected randomly between the age group of 20-25 years who have full complement of teeth and have no speech problems. Speech evaluation was done under four experimental conditions i.e. Without any experimental acrylic palatal plate (control), with experimental acrylic palatal plates of thickness 1 mm, 2 mm and 3 mm respectively. The speech material for phonation test consisted of Vowels sounds /a/, /i/, and /o/. Speech analysis to assess phonation was done using digital acoustic analysis (PRAAT software). The obtained results were statistically analyzed by One-way ANOVA and Tukey’s multiple post-hoc for comparison of four experimental conditions with respect to different vowel sounds. Results: Mean harmonics to noise ratio (HNR) values obtained for all the Experimental conditions did not show significant difference (p>0.05). In conclusion, an increase in the thickness of the acrylic palatal plate of maxillary prosthesis for about 1 mm - 3mm in complete or partial maxillary removable dentures resulted in no significant effect on phonation of vowel sounds /a/, /i/ and /o/. Conclusion: Increasing the thickness of the palatal plate from 1 mm to 3 mm has not shown any significant effect on the phonation. PMID:24959508
Parmanto, Bambang; Saptono, Andi; Murthi, Raymond; Safos, Charlotte; Lathan, Corinna E
2008-11-01
A secure telemonitoring system was developed to transform CosmoBot system, a stand-alone speech-language therapy software, into a telerehabilitation system. The CosmoBot system is a motivating, computer-based play character designed to enhance children's communication skills and stimulate verbal interaction during the remediation of speech and language disorders. The CosmoBot system consists of the Mission Control human interface device and Cosmo's Play and Learn software featuring a robot character named Cosmo that targets educational goals for children aged 3-5 years. The secure telemonitoring infrastructure links a distant speech-language therapist and child/parents at home or school settings. The result is a telerehabilitation system that allows a speech-language therapist to monitor children's activities at home while providing feedback and therapy materials remotely. We have developed the means for telerehabilitation of communication skills that can be implemented in children's home settings. The architecture allows the therapist to remotely monitor the children after completion of the therapy session and to provide feedback for the following session.
Somanath, Keerthan; Mau, Ted
2016-11-01
(1) To develop an automated algorithm to analyze electroglottographic (EGG) signal in continuous dysphonic speech, and (2) to identify EGG waveform parameters that correlate with the auditory-perceptual quality of strain in the speech of patients with adductor spasmodic dysphonia (ADSD). Software development with application in a prospective controlled study. EGG was recorded from 12 normal speakers and 12 subjects with ADSD reading excerpts from the Rainbow Passage. Data were processed by a new algorithm developed with the specific goal of analyzing continuous dysphonic speech. The contact quotient, pulse width, a new parameter peak skew, and various contact closing slope quotient and contact opening slope quotient measures were extracted. EGG parameters were compared between normal and ADSD speech. Within the ADSD group, intra-subject comparison was also made between perceptually strained syllables and unstrained syllables. The opening slope quotient SO7525 distinguished strained syllables from unstrained syllables in continuous speech within individual subjects with ADSD. The standard deviations, but not the means, of contact quotient, EGGW50, peak skew, and SO7525 were different between normal and ADSD speakers. The strain-stress pattern in continuous speech can be visualized as color gradients based on the variation of EGG parameter values. EGG parameters may provide a within-subject measure of vocal strain and serve as a marker for treatment response. The addition of EGG to multidimensional assessment may lead to improved characterization of the voice disturbance in ADSD. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Somanath, Keerthan; Mau, Ted
2016-01-01
Objectives (1) To develop an automated algorithm to analyze electroglottographic (EGG) signal in continuous, dysphonic speech, and (2) to identify EGG waveform parameters that correlate with the auditory-perceptual quality of strain in the speech of patients with adductor spasmodic dysphonia (ADSD). Study Design Software development with application in a prospective controlled study. Methods EGG was recorded from 12 normal speakers and 12 subjects with ADSD reading excerpts from the Rainbow Passage. Data were processed by a new algorithm developed with the specific goal of analyzing continuous dysphonic speech. The contact quotient (CQ), pulse width (EGGW), a new parameter peak skew, and various contact closing slope quotient (SC) and contact opening slope quotient (SO) measures were extracted. EGG parameters were compared between normal and ADSD speech. Within the ADSD group, intra-subject comparison was also made between perceptually strained syllables and unstrained syllables. Results The opening slope quotient SO7525 distinguished strained syllables from unstrained syllables in continuous speech within individual ADSD subjects. The standard deviations, but not the means, of CQ, EGGW50, peak skew, and SO7525 were different between normal and ADSD speakers. The strain-stress pattern in continuous speech can be visualized as color gradients based on the variation of EGG parameter values. Conclusions EGG parameters may provide a within-subject measure of vocal strain and serve as a marker for treatment response. The addition of EGG to multi-dimensional assessment may lead to improved characterization of the voice disturbance in ADSD. PMID:26739857
DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1
NASA Astrophysics Data System (ADS)
Garofolo, J. S.; Lamel, L. F.; Fisher, W. M.; Fiscus, J. G.; Pallett, D. S.
1993-02-01
The Texas Instruments/Massachusetts Institute of Technology (TIMIT) corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT contains speech from 630 speakers representing 8 major dialect divisions of American English, each speaking 10 phonetically-rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic, and word transcriptions, as well as speech waveform data for each spoken sentence. The release of TIMIT contains several improvements over the Prototype CD-ROM released in December, 1988: (1) full 630-speaker corpus, (2) checked and corrected transcriptions, (3) word-alignment transcriptions, (4) NIST SPHERE-headered waveform files and header manipulation software, (5) phonemic dictionary, (6) new test and training subsets balanced for dialectal and phonetic coverage, and (7) more extensive documentation.
Are You Talking to Me? Dialogue Systems Supporting Mixed Teams of Humans and Robots
NASA Technical Reports Server (NTRS)
Dowding, John; Clancey, William J.; Graham, Jeffrey
2006-01-01
This position paper describes an approach to building spoken dialogue systems for environments containing multiple human speakers and hearers, and multiple robotic speakers and hearers. We address the issue, for robotic hearers, of whether the speech they hear is intended for them, or more likely to be intended for some other hearer. We will describe data collected during a series of experiments involving teams of multiple human and robots (and other software participants), and some preliminary results for distinguishing robot-directed speech from human-directed speech. The domain of these experiments is Mars-analogue planetary exploration. These Mars-analogue field studies involve two subjects in simulated planetary space suits doing geological exploration with the help of 1-2 robots, supporting software agents, a habitat communicator and links to a remote science team. The two subjects are performing a task (geological exploration) which requires them to speak with each other while also speaking with their assistants. The technique used here is to use a probabilistic context-free grammar language model in the speech recognizer that is trained on prior robot-directed speech. Intuitively, the recognizer will give higher confidence to an utterance if it is similar to utterances that have been directed to the robot in the past.
Automated Speech Rate Measurement in Dysarthria.
Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc
2015-06-01
In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. The new algorithm was trained and tested using Dutch speech samples of 36 speakers with no history of speech impairment and 40 speakers with mild to moderate dysarthria. We tested the algorithm under various conditions: according to speech task type (sentence reading, passage reading, and storytelling) and algorithm optimization method (speaker group optimization and individual speaker optimization). Correlations between automated and human SR determination were calculated for each condition. High correlations between automated and human SR determination were found in the various testing conditions. The new algorithm measures SR in a sufficiently reliable manner. It is currently being integrated in a clinical software tool for assessing and managing prosody in dysarthric speech. Further research is needed to fine-tune the algorithm to severely dysarthric speech, to make the algorithm less sensitive to background noise, and to evaluate how the algorithm deals with syllabic consonants.
ERIC Educational Resources Information Center
Sidgi, Lina Fathi Sidig; Shaari, Ahmad Jelani
2017-01-01
The use of technology, such as computer-assisted language learning (CALL), is used in teaching and learning in the foreign language classrooms where it is most needed. One promising emerging technology that supports language learning is automatic speech recognition (ASR). Integrating such technology, especially in the instruction of pronunciation…
The Study and Implementation of Text-to-Speech System for Agricultural Information
NASA Astrophysics Data System (ADS)
Zheng, Huoguo; Hu, Haiyan; Liu, Shihong; Meng, Hong
The Broadcast and Television coverage has increased to more than 98% in china. Information services by radio have wide coverage, low cost, easy-to-grass-roots farmers to accept etc. characteristics. In order to play the better role of broadcast information service, as well as aim at the problem of lack of information resource in rural, we R & D the text-to-speech system. The system includes two parts, software and hardware device, both of them can translate text into audio file. The software subsystem was implemented basic on third-part middleware, and the hardware subsystem was realized with microelectronics technology. Results indicate that the hardware is better than software. The system has been applied in huailai city hebei province, which has conversed more than 8000 audio files as programming materials for the local radio station.
ERIC Educational Resources Information Center
Lieberth, Ann K.; Martin, Doug R.
1995-01-01
Because of the diversity of clients served by speech-language pathologists and audiologists, available commercial software may not meet all needs. Authoring programs allow the clinician to design software that can be customized for individual clients. This article describes an authoring program called HyperCard and its use in preparing hypermedia…
Smart command recognizer (SCR) - For development, test, and implementation of speech commands
NASA Technical Reports Server (NTRS)
Simpson, Carol A.; Bunnell, John W.; Krones, Robert R.
1988-01-01
The SCR, a rapid prototyping system for the development, testing, and implementation of speech commands in a flight simulator or test aircraft, is described. A single unit performs all functions needed during these three phases of system development, while the use of common software and speech command data structure files greatly reduces the preparation time for successive development phases. As a smart peripheral to a simulation or flight host computer, the SCR interprets the pilot's spoken input and passes command codes to the simulation or flight computer.
SDI Software Technology Program Plan Version 1.5
1987-06-01
computer generation of auditory communication of meaningful speech. Most speech synthesizers are based on mathematical models of the human vocal tract, but...oral/ auditory and multimodal communications. Although such state-of-the-art interaction technology has not fully matured, user experience has...superior I pattern matching capabilities and the subliminal intuitive deduction capability. The error performance of humans can be helped by careful
Ultrasound analysis of tongue contour for the sound [j] in adults and children.
Barberena, Luciana da Silva; Simoni, Simone Nicolini de; Souza, Rosalina Correa Sobrinho de; Moraes, Denis Altieri de Oliveira; Berti, Larissa Cristina; Keske-Soares, Márcia
2017-12-11
Analyze and compare the mean tongue contours and articulatory gestures in the production of the sound [j] in adults and children with typical and atypical speech development. The children with atypical development presented speech sound disorders. The diagnosis was determined by speech assessments. The study sample was composed of 90 individuals divided into three groups: 30 adults with typical speech development aged 19-44 years (AT), 30 children with typical speech development (CT), and 30 children with speech sound disorders, named as atypical in this study, aged four years to eight years and eleven months (CA). Ultrasonography assessment of tongue movements was performed for all groups. Mean tongue contours were compared between three groups in different vocalic contexts following the sound [j]. The maximum elevation of the tongue tip was considered for delimitation of gestures using the Articulate Assistant Advanced (AAA) software and images in sagittal plane/Mode B. The points that intercepted the language curves were analyzed by the statistical tool R. The graphs of tongue contours were obtained adopting a 95% confidence interval. After that, the regions with significant statistical differences (p<0.05) between the CT and CA groups were obtained. The mean tongue contours demonstrated the gesture for the sound [j] in the comparison between typical and atypical children. For the semivowel [j], there is an articulatory gesture of tongue and dorsum towards the center of the hard palate, with significant differences observed between the children. The results showed differences between the groups of children regarding the ability to refine articulatory gestures.
NASA Astrophysics Data System (ADS)
Ghoraani, Behnaz; Krishnan, Sridhar
2009-12-01
The number of people affected by speech problems is increasing as the modern world places increasing demands on the human voice via mobile telephones, voice recognition software, and interpersonal verbal communications. In this paper, we propose a novel methodology for automatic pattern classification of pathological voices. The main contribution of this paper is extraction of meaningful and unique features using Adaptive time-frequency distribution (TFD) and nonnegative matrix factorization (NMF). We construct Adaptive TFD as an effective signal analysis domain to dynamically track the nonstationarity in the speech and utilize NMF as a matrix decomposition (MD) technique to quantify the constructed TFD. The proposed method extracts meaningful and unique features from the joint TFD of the speech, and automatically identifies and measures the abnormality of the signal. Depending on the abnormality measure of each signal, we classify the signal into normal or pathological. The proposed method is applied on the Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database which consists of 161 pathological and 51 normal speakers, and an overall classification accuracy of 98.6% was achieved.
Computerized Measurement of Negative Symptoms in Schizophrenia
Cohen, Alex S.; Alpert, Murray; Nienow, Tasha M.; Dinzeo, Thomas J.; Docherty, Nancy M.
2008-01-01
Accurate measurement of negative symptoms is crucial for understanding and treating schizophrenia. However, current measurement strategies are reliant on subjective symptom rating scales which often have psychometric and practical limitations. Computerized analysis of patients’ speech offers a sophisticated and objective means of evaluating negative symptoms. The present study examined the feasibility and validity of using widely-available acoustic and lexical-analytic software to measure flat affect, alogia and anhedonia (via positive emotion). These measures were examined in their relationships to clinically-rated negative symptoms and social functioning. Natural speech samples were collected and analyzed for 14 patients with clinically-rated flat affect, 46 patients without flat affect and 19 healthy controls. The computer-based inflection and speech rate measures significantly discriminated patients with flat affect from controls, and the computer-based measure of alogia and negative emotion significantly discriminated the flat and non-flat patients. Both the computer and clinical measures of positive emotion/anhedonia corresponded to functioning impairments. The computerized method of assessing negative symptoms offered a number of advantages over the symptom scale-based approach. PMID:17920078
SUS users' perception: a speech-language pathology approach based on health promotion.
Cunha, Jenane Topanotti da; Massi, Giselle; Guarinello, Ana Cristina; Pereira, Francine Martins
2016-01-01
This study aimed to analyze the perceptions of users of the Brazilian Unified Health System (SUS) about the treatment Center where they were assisted, as well as about the speech-language pathology services rendered by this Center. This is a transversal study composed of an interview with 26 open questions and 14 closed questions applied to 111 individuals who were assisted at the SUS Center in August 2013. The quantitative content analysis was conducted through the use of the GraphPadPrisma 5.1, Statistic Package for Social Sciences (SPSS) 15.0 software and the application of the D'agostino & Person, F-test and chi-squared test. Most participants reported a positive perception about the facilities and staff of the treatment center. They were also positive about the waiting time and the speech-language pathologists' explanations and conduct, especially in the audiology department. Most responses from participants were short and did not present an argumentative context. The treatment center received a high approval rating by most users. The audiology department had better grades than the clinical services related to language and oral motor pathologies.
[Telemonitoring of swallowing function: technologies in speech therapy practice.
Tedesco, Angela; Lavermicocca, Valentina; Notarnicola, Marilina; De Francesco, Luca; Dellomonaco, Anna Rita
2018-02-01
The process of medical-healthcare technological revolution represents an advantage for the patient and for the care provider, in terms of costs and distances reduction. The telehomecare approach could be useful for monitoring the swallowing disorder in neurodegenerative diseases, preventing complications. In this study the applicability of telemedicine techniques for the monitoring of swallowing function, in patients affected by Huntington's disease (HD), was evaluated through the acquisition and analysis of the sound of swallowing. Two patients with HD were outpatient screened for dysphagia through the Bedside Swallowing Assessment Scale (BSAS) sensitized with pulse oximetry and cervical auscultation. Subsequently, the swallowing functionality was telemonitored for three months with Skype. The swallowing sounds were acquired with a detection microphone attached to the lateral edge of the trachea during fluid intake. The sounds were instantly processed and graphically represented through the Praat software. The analysis of the acoustic signal acquired remotely has made it possible to identify the situations that required immediate speech therapy intervention, suggesting to the patients further modifications of food consistencies, and saving frequent moving to the hospital even in the absence of critical situations. Remote assistance applied to speech therapy could represent a benefit for patients and their carers and a more efficient use of medical and health resources.
Brancalioni, Ana Rita; Magnago, Karine Faverzani; Keske-Soares, Marcia
2012-09-01
The objective of this study is to create a new proposal for classifying the severity of speech disorders using a fuzzy model in accordance with a linguistic model that represents the speech acquisition of Brazilian Portuguese. The fuzzy linguistic model was run in the MATLAB software fuzzy toolbox from a set of fuzzy rules, and it encompassed three input variables: path routing, level of complexity and phoneme acquisition. The output was the Speech Disorder Severity Index, and it used the following fuzzy subsets: severe, moderate severe, mild moderate and mild. The proposal was used for 204 children with speech disorders who were monolingual speakers of Brazilian Portuguese. The fuzzy linguistic model provided the Speech Disorder Severity Index for all of the evaluated phonological systems in a fast and practical manner. It was then possible to classify the systems according to the severity of the speech disorder as severe, moderate severe, mild moderate and mild; the speech disorders could also be differentiated according to the severity index.
Speech recognition systems on the Cell Broadband Engine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Y; Jones, H; Vaidya, S
In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousandsmore » of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.« less
NASA Astrophysics Data System (ADS)
The subjects discussed are related to LSI/VLSI based subscriber transmission and customer access for the Integrated Services Digital Network (ISDN), special applications of fiber optics, ISDN and competitive telecommunication services, technical preparations for the Geostationary-Satellite Orbit Conference, high-capacity statistical switching fabrics, networking and distributed systems software, adaptive arrays and cancelers, synchronization and tracking, speech processing, advances in communication terminals, full-color videotex, and a performance analysis of protocols. Advances in data communications are considered along with transmission network plans and progress, direct broadcast satellite systems, packet radio system aspects, radio-new and developing technologies and applications, the management of software quality, and Open Systems Interconnection (OSI) aspects of telematic services. Attention is given to personal computers and OSI, the role of software reliability measurement in information systems, and an active array antenna for the next-generation direct broadcast satellite.
ERIC Educational Resources Information Center
Arcon, Nina; Klein, Perry D.; Dombroski, Jill D.
2017-01-01
Previous research has shown that both dictation and speech-to-text (STT) software can increase the quality of writing for native English speakers. The purpose of this study was to investigate the effect of these modalities on the written composition and cognitive load of elementary school English language learners (ELLs). In a within-subjects…
The Effect of Speech-to-Text Technology on Learning a Writing Strategy
ERIC Educational Resources Information Center
Haug, Katrina N.; Klein, Perry D.
2018-01-01
Previous research has shown that speech-to-text (STT) software can support students in producing a given piece of writing. This is the 1st study to investigate the use of STT to teach a writing strategy. We pretested 45 Grade 5 students on argument writing and trained them to use STT. Students participated in 4 lessons on an argument writing…
Robotics control using isolated word recognition of voice input
NASA Technical Reports Server (NTRS)
Weiner, J. M.
1977-01-01
A speech input/output system is presented that can be used to communicate with a task oriented system. Human speech commands and synthesized voice output extend conventional information exchange capabilities between man and machine by utilizing audio input and output channels. The speech input facility is comprised of a hardware feature extractor and a microprocessor implemented isolated word or phrase recognition system. The recognizer offers a medium sized (100 commands), syntactically constrained vocabulary, and exhibits close to real time performance. The major portion of the recognition processing required is accomplished through software, minimizing the complexity of the hardware feature extractor.
APEX/SPIN: a free test platform to measure speech intelligibility.
Francart, Tom; Hofmann, Michael; Vanthornhout, Jonas; Van Deun, Lieselot; van Wieringen, Astrid; Wouters, Jan
2017-02-01
Measuring speech intelligibility in quiet and noise is important in clinical practice and research. An easy-to-use free software platform for conducting speech tests is presented, called APEX/SPIN. The APEX/SPIN platform allows the use of any speech material in combination with any noise. A graphical user interface provides control over a large range of parameters, such as number of loudspeakers, signal-to-noise ratio and parameters of the procedure. An easy-to-use graphical interface is provided for calibration and storage of calibration values. To validate the platform, perception of words in quiet and sentences in noise were measured both with APEX/SPIN and with an audiometer and CD player, which is a conventional setup in current clinical practice. Five normal-hearing listeners participated in the experimental evaluation. Speech perception results were similar for the APEX/SPIN platform and conventional procedures. APEX/SPIN is a freely available and open source platform that allows the administration of all kinds of custom speech perception tests and procedures.
NASA Astrophysics Data System (ADS)
Izdebski, Krzysztof; Jarosz, Paweł; Usydus, Ireneusz
2017-02-01
Ventilation, speech and singing must use facial musculature to complete these motor tasks and these tasks are fueled by the air we inhale. This motor process requires increase in the blood flow as the muscles contract and relax, therefore skin surface temperature changes are expected. Hence, we used thermography to image these effects. The system used was the thermography camera model FLIR X6580sc with a chilled detector (FLIR Systems Advanced Thermal Solutions, 27700 SW Parkway Ave Wilsonville, OR 97070, USA). To assure improved imaging, the room temperature was air-conditioned to +18° C. All images were recoded at the speed of 30 f/s. Acquired data were analyzed with FLIR Research IR Max Version 4 software and software filters. In this preliminary study a male subject was imaged from frontal and lateral views simultaneously while he performed normal resting ventilation, speech and song. The lateral image was captured in a stainless steel mirror. Results showed different levels of heat flow in the facial musculature as a function of these three tasks. Also, we were able to capture the exalted air jet directionality. The breathing jet was discharged in horizontal direction, speaking voice jet was discharged downwards while singing jet went upward. We interpreted these jet directions as representing different gas content of air expired during these different tasks, with speech having less oxygen than singing. Further studies examining gas exchange during various forms of speech and song and emotional states are warranted.
Virtual Observer Controller (VOC) for Small Unit Infantry Laser Simulation Training
2007-04-01
per-seat license when deployed. As a result, ViaVoice was abandoned early in development. Next, the SPHINX engine from Carnegie Mellon University was...examined. Sphinx is Java-based software, providing cross-platform functionality, and it is also free, open-source software. Software developers at...IST had experience using SPHINX , so it was initially selected it to be the VOC speech engine. After implementing a small portion of the VOC grammar
ERIC Educational Resources Information Center
Dance, Frank E. X.; And Others
This paper reports on the Futuristic Priorities Division members' recommendations and priorities concerning the impact of the future on communication and on the speech communication discipline. The recommendations and priorities are listed for two subgroups: The Communication Needs and Rights of Mankind; and Future Communication Technologies:…
Limited connected speech experiment
NASA Astrophysics Data System (ADS)
Landell, P. B.
1983-03-01
The purpose of this contract was to demonstrate that connected Speech Recognition (CSR) can be performed in real-time on a vocabulary of one hundred words and to test the performance of the CSR system for twenty-five male and twenty-five female speakers. This report describes the contractor's real-time laboratory CSR system, the data base and training software developed in accordance with the contract, and the results of the performance tests.
[Computer assisted application of mandarin speech test materials].
Zhang, Hua; Wang, Shuo; Chen, Jing; Deng, Jun-Min; Yang, Xiao-Lin; Guo, Lian-Sheng; Zhao, Xiao-Yan; Shao, Guang-Yu; Han, De-Min
2008-06-01
To design an intelligent speech test system with reliability and convenience using the computer software and to evaluate this system. First, the intelligent system was designed by the Delphi program language. Second, the seven monosyllabic word lists recorded on CD were separated by Cool Edit Pro v2.1 software and put into the system as test materials. Finally, the intelligent system was used to evaluate the equivalence of difficulty between seven lists. Fifty-five college students with normal hearing participated in the study. The seven monosyllabic word lists had equivalent difficulty (F = 1.582, P > 0.05) to the subjects between each other and the system was proved as reliability and convenience. The intelligent system has the feasibility in the clinical practice.
[Pleasure-suffering indicators of nursing work in a hemodialysis nursing service].
Prestes, Francine Cassol; Beck, Carmem Lúcia Colomé; Magnago, Tânia Solange Bosi de Souza; Silva, Rosângela Marion da
2015-06-01
To measure the pleasure and suffering indicators at work and relate them to the socio-demographic and employment characteristics of the nursing staff in a hemodialysis center in southern Brazil. Quantitative research, with 46 workers. We used a self-completed form with demographic and labor data and the Pleasure and Suffering Indicators at Work Scale (PSIWS). We conducted a bivariate and correlation descriptive analysis with significance levels of 5% using the Epi-Info® and PredictiveAnalytics Software programs. Freedom of Speech was considered critical; other factors were evaluated as satisfactory. The results revealed a possible association between sociodemographic characteristics and work, and pleasure and suffering indicators. There was a correlation between the factors evaluated. Despite the satisfactory evaluation, suffering is present in the studied context, expressed mainly by a lack of Freedom of Speech, with the need for interventions to prevent injury to the health of workers.
The Relationship Between Speech Production and Speech Perception Deficits in Parkinson's Disease.
De Keyser, Kim; Santens, Patrick; Bockstael, Annelies; Botteldooren, Dick; Talsma, Durk; De Vos, Stefanie; Van Cauwenberghe, Mieke; Verheugen, Femke; Corthals, Paul; De Letter, Miet
2016-10-01
This study investigated the possible relationship between hypokinetic speech production and speech intensity perception in patients with Parkinson's disease (PD). Participants included 14 patients with idiopathic PD and 14 matched healthy controls (HCs) with normal hearing and cognition. First, speech production was objectified through a standardized speech intelligibility assessment, acoustic analysis, and speech intensity measurements. Second, an overall estimation task and an intensity estimation task were addressed to evaluate overall speech perception and speech intensity perception, respectively. Finally, correlation analysis was performed between the speech characteristics of the overall estimation task and the corresponding acoustic analysis. The interaction between speech production and speech intensity perception was investigated by an intensity imitation task. Acoustic analysis and speech intensity measurements demonstrated significant differences in speech production between patients with PD and the HCs. A different pattern in the auditory perception of speech and speech intensity was found in the PD group. Auditory perceptual deficits may influence speech production in patients with PD. The present results suggest a disturbed auditory perception related to an automatic monitoring deficit in PD.
Fifty years of progress in speech and speaker recognition
NASA Astrophysics Data System (ADS)
Furui, Sadaoki
2004-10-01
Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus-base statistical modeling, e.g., HMM and n-grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time-normalization to DTW/DP matching, (4) from gdistanceh-based to likelihood-based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context-independent units to context-dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker-independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single-modality (audio signal only) to multi-modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.
Research and development of a versatile portable speech prosthesis
NASA Technical Reports Server (NTRS)
1981-01-01
The Versatile Portable Speech Prosthesis (VPSP), a synthetic speech output communication aid for non-speaking people is described. It was intended initially for severely physically limited people with cerebral palsy who are in electric wheelchairs. Hence, it was designed to be placed on a wheelchair and powered from a wheelchair battery. It can easily be separated from the wheelchair. The VPSP is versatile because it is designed to accept any means of single switch, multiple switch, or keyboard control which physically limited people have the ability to use. It is portable because it is mounted on and can go with the electric wheelchair. It is a speech prosthesis, obviously, because it speaks with a synthetic voice for people unable to speak with their own voices. Both hardware and software are described.
Pollonini, Luca; Olds, Cristen; Abaya, Homer; Bortfeld, Heather; Beauchamp, Michael S; Oghalai, John S
2014-03-01
The primary goal of most cochlear implant procedures is to improve a patient's ability to discriminate speech. To accomplish this, cochlear implants are programmed so as to maximize speech understanding. However, programming a cochlear implant can be an iterative, labor-intensive process that takes place over months. In this study, we sought to determine whether functional near-infrared spectroscopy (fNIRS), a non-invasive neuroimaging method which is safe to use repeatedly and for extended periods of time, can provide an objective measure of whether a subject is hearing normal speech or distorted speech. We used a 140 channel fNIRS system to measure activation within the auditory cortex in 19 normal hearing subjects while they listed to speech with different levels of intelligibility. Custom software was developed to analyze the data and compute topographic maps from the measured changes in oxyhemoglobin and deoxyhemoglobin concentration. Normal speech reliably evoked the strongest responses within the auditory cortex. Distorted speech produced less region-specific cortical activation. Environmental sounds were used as a control, and they produced the least cortical activation. These data collected using fNIRS are consistent with the fMRI literature and thus demonstrate the feasibility of using this technique to objectively detect differences in cortical responses to speech of different intelligibility. Copyright © 2013 Elsevier B.V. All rights reserved.
Auditory training changes temporal lobe connectivity in 'Wernicke's aphasia': a randomised trial.
Woodhead, Zoe Vj; Crinion, Jennifer; Teki, Sundeep; Penny, Will; Price, Cathy J; Leff, Alexander P
2017-07-01
Aphasia is one of the most disabling sequelae after stroke, occurring in 25%-40% of stroke survivors. However, there remains a lack of good evidence for the efficacy or mechanisms of speech comprehension rehabilitation. This within-subjects trial tested two concurrent interventions in 20 patients with chronic aphasia with speech comprehension impairment following left hemisphere stroke: (1) phonological training using 'Earobics' software and (2) a pharmacological intervention using donepezil, an acetylcholinesterase inhibitor. Donepezil was tested in a double-blind, placebo-controlled, cross-over design using block randomisation with bias minimisation. The primary outcome measure was speech comprehension score on the comprehensive aphasia test. Magnetoencephalography (MEG) with an established index of auditory perception, the mismatch negativity response, tested whether the therapies altered effective connectivity at the lower (primary) or higher (secondary) level of the auditory network. Phonological training improved speech comprehension abilities and was particularly effective for patients with severe deficits. No major adverse effects of donepezil were observed, but it had an unpredicted negative effect on speech comprehension. The MEG analysis demonstrated that phonological training increased synaptic gain in the left superior temporal gyrus (STG). Patients with more severe speech comprehension impairments also showed strengthening of bidirectional connections between the left and right STG. Phonological training resulted in a small but significant improvement in speech comprehension, whereas donepezil had a negative effect. The connectivity results indicated that training reshaped higher order phonological representations in the left STG and (in more severe patients) induced stronger interhemispheric transfer of information between higher levels of auditory cortex.Clinical trial registrationThis trial was registered with EudraCT (2005-004215-30, https:// eudract .ema.europa.eu/) and ISRCTN (68939136, http://www.isrctn.com/). © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
The use of conjunctions by children with typical language development.
Glória, Yasmin Alves Leão; Hanauer, Letícia Pessota; Wiethan, Fernanda Marafiga; Nóro, Letícia Arruda; Mota, Helena Bolli
2016-07-04
To investigate the use of conjunctions in the spontaneous speech of three years old children with typical language development, who live in Santa Maria - RS. 45 children, aged 3:0;0 - 3:11;29 (years:months;days) from the database of the Center for the Study of Language and Speech (CELF) participated of this study. The spontaneous speech of each child was transcribed and followed by analysis of the samples to identify the types of conjunctions for each age group. The samples were statistically analyzed using the R software that allowed the evaluation of the number and type of conjunctions used in each age group by comparing them with each other. The data indicated that the higher the age of the child, the greater the number of types of conjunctions used by them. The comparison between age groups showed significant differences when comparing the average number of conjunctions per age group, as well as for additive conjunctions and subordinating conjunctions. At age of three the children begin to develop the grammatical use of conjunctions, early appearing additive, adversative and explanatory coordinating conjunctions, and at 3:6 they are able to use the most complex conjunctions, as subordinating conjunctions.
Facilities to assist people to research into stammered speech
Howell, Peter; Huckvale, Mark
2008-01-01
The purpose of this article is to indicate how access can be obtained, through Stammering Research, to audio recordings and transcriptions of spontaneous speech data from speakers who stammer. Selections of the first author’s data are available in several formats. We describe where to obtain free software for manipulation and analysis of the data in their respective formats. Papers reporting analyses of these data are invited as submissions to this section of Stammering Research. It is intended that subsequent analyses that employ these data will be published in Stammering Research on an on-going basis. Plans are outlined to provide similar data from young speakers (ones developing fluently and ones who stammer), follow-up data from speakers who stammer, data from speakers who stammer who do not speak English and from speakers who have other speech disorders, for comparison, all through the pages of Stammering Research. The invitation is extended to those promulgating evidence-based practice approaches (see the Journal of Fluency Disorders, volume 28, number 4 which is a special issue devoted to this topic) and anyone with other interesting data related to stammering to prepare them in a form that can be made accessible to others via Stammering Research. PMID:18418475
Moerman, Mieke; Martens, Jean-Pierre; Dejonckere, Philippe
2015-04-01
This article is a compilation of own research performed during the European COoperation in Science and Technology (COST) action 2103: 'Advance Voice Function Assessment', an initiative of voice and speech processing teams consisting of physicists, engineers, and clinicians. This manuscript concerns analyzing largely irregular voicing types, namely substitution voicing (SV) and adductor spasmodic dysphonia (AdSD). A specific perceptual rating scale (IINFVo) was developed, and the Auditory Model Based Pitch Extractor (AMPEX), a piece of software that automatically analyses running speech and generates pitch values in background noise, was applied. The IINFVo perceptual rating scale has been shown to be useful in evaluating SV. The analysis of strongly irregular voices stimulated a modification of the European Laryngological Society's assessment protocol which was originally designed for the common types of (less severe) dysphonia. Acoustic analysis with AMPEX demonstrates that the most informative features are, for SV, the voicing-related acoustic features and, for AdSD, the perturbation measures. Poor correlations between self-assessment and acoustic and perceptual dimensions in the assessment of highly irregular voices argue for a multidimensional approach.
Using Animated Language Software with Children Diagnosed with Autism Spectrum Disorders
ERIC Educational Resources Information Center
Mulholland, Rita; Pete, Ann Marie; Popeson, Joanne
2008-01-01
We examined the impact of using an animated software program (Team Up With Timo) on the expressive and receptive language abilities of five children ages 5-9 in a self-contained Learning and Language Disabilities class. We chose to use Team Up With Timo (Animated Speech Corporation) because it allows the teacher to personalize the animation for…
Adapting a Computerized Medical Dictation System to Prepare Academic Papers in Radiology.
Sánchez, Yadiel; Prabhakar, Anand M; Uppot, Raul N
2017-09-14
Everyday radiologists use dictation software to compose clinical reports of imaging findings. The dictation software is tailored for medical use and to the speech pattern of each radiologist. Over the past 10 years we have used dictation software to compose academic manuscripts, correspondence letters, and texts of educational exhibits. The advantages of using voice dictation is faster composition of manuscripts. However, use of such software requires preparation. The purpose of this article is to review the steps of adapting a clinical dictation software for dictating academic manuscripts and detail the advantages and limitations of this technique. Copyright © 2017 Elsevier Inc. All rights reserved.
Visual-auditory integration during speech imitation in autism.
Williams, Justin H G; Massaro, Dominic W; Peel, Natalie J; Bosseler, Alexis; Suddendorf, Thomas
2004-01-01
Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional 'mirror neuron' systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a 'virtual' head (Baldi), delivered speech stimuli for identification in auditory, visual or bimodal conditions. Children with ASD were poorer than controls at recognizing stimuli in the unimodal conditions, but once performance on this measure was controlled for, no group difference was found in the bimodal condition. A group of participants with ASD were also trained to develop their speech-reading ability. Training improved visual accuracy and this also improved the children's ability to utilize visual information in their processing of speech. Overall results were compared to predictions from mathematical models based on integration and non-integration, and were most consistent with the integration model. We conclude that, whilst they are less accurate in recognizing stimuli in the unimodal condition, children with ASD show normal integration of visual and auditory speech stimuli. Given that training in recognition of visual speech was effective, children with ASD may benefit from multi-modal approaches in imitative therapy and language training.
Speech Spectrum's Correlation with Speakers' Eysenck Personality Traits
Hu, Chao; Wang, Qiandong; Short, Lindsey A.; Fu, Genyue
2012-01-01
The current study explored the correlation between speakers' Eysenck personality traits and speech spectrum parameters. Forty-six subjects completed the Eysenck Personality Questionnaire. They were instructed to verbally answer the questions shown on a computer screen and their responses were recorded by the computer. Spectrum parameters of /sh/ and /i/ were analyzed by Praat voice software. Formant frequencies of the consonant /sh/ in lying responses were significantly lower than that in truthful responses, whereas no difference existed on the vowel /i/ speech spectrum. The second formant bandwidth of the consonant /sh/ speech spectrum was significantly correlated with the personality traits of Psychoticism, Extraversion, and Neuroticism, and the correlation differed between truthful and lying responses, whereas the first formant frequency of the vowel /i/ speech spectrum was negatively correlated with Neuroticism in both response types. The results suggest that personality characteristics may be conveyed through the human voice, although the extent to which these effects are due to physiological differences in the organs associated with speech or to a general Pygmalion effect is yet unknown. PMID:22439014
CACTI: free, open-source software for the sequential coding of behavioral interactions.
Glynn, Lisa H; Hallgren, Kevin A; Houck, Jon M; Moyers, Theresa B
2012-01-01
The sequential analysis of client and clinician speech in psychotherapy sessions can help to identify and characterize potential mechanisms of treatment and behavior change. Previous studies required coding systems that were time-consuming, expensive, and error-prone. Existing software can be expensive and inflexible, and furthermore, no single package allows for pre-parsing, sequential coding, and assignment of global ratings. We developed a free, open-source, and adaptable program to meet these needs: The CASAA Application for Coding Treatment Interactions (CACTI). Without transcripts, CACTI facilitates the real-time sequential coding of behavioral interactions using WAV-format audio files. Most elements of the interface are user-modifiable through a simple XML file, and can be further adapted using Java through the terms of the GNU Public License. Coding with this software yields interrater reliabilities comparable to previous methods, but at greatly reduced time and expense. CACTI is a flexible research tool that can simplify psychotherapy process research, and has the potential to contribute to the improvement of treatment content and delivery.
Exploring expressivity and emotion with artificial voice and speech technologies.
Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James
2013-10-01
Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.
Varley, Rosemary; Cowell, Patricia E; Dyson, Lucy; Inglis, Lesley; Roper, Abigail; Whiteside, Sandra P
2016-03-01
There is currently little evidence on effective interventions for poststroke apraxia of speech. We report outcomes of a trial of self-administered computer therapy for apraxia of speech. Effects of speech intervention on naming and repetition of treated and untreated words were compared with those of a visuospatial sham program. The study used a parallel-group, 2-period, crossover design, with participants receiving 2 interventions. Fifty participants with chronic and stable apraxia of speech were randomly allocated to 1 of 2 order conditions: speech-first condition versus sham-first condition. Period 1 design was equivalent to a randomized controlled trial. We report results for this period and profile the effect of the period 2 crossover. Period 1 results revealed significant improvement in naming and repetition only in the speech-first group. The sham-first group displayed improvement in speech production after speech intervention in period 2. Significant improvement of treated words was found in both naming and repetition, with little generalization to structurally similar and dissimilar untreated words. Speech gains were largely maintained after withdrawal of intervention. There was a significant relationship between treatment dose and response. However, average self-administered dose was modest for both groups. Future software design would benefit from incorporation of social and gaming components to boost motivation. Single-word production can be improved in chronic apraxia of speech with behavioral intervention. Self-administered computerized therapy is a promising method for delivering high-intensity speech/language rehabilitation. URL: http://orcid.org/0000-0002-1278-0601. Unique identifier: ISRCTN88245643. © 2016 American Heart Association, Inc.
Trinkaus, Hans L; Gaisser, Andrea E
2010-09-01
Nearly 30,000 individual inquiries are answered annually by the telephone cancer information service (CIS, KID) of the German Cancer Research Center (DKFZ). The aim was to develop a tool for evaluating these calls, and to support the complete counseling process interactively. A novel software tool is introduced, based on a structure similar to a music score. Treating the interaction as a "duet", guided by the CIS counselor, the essential contents of the dialogue are extracted automatically. For this, "trained speech recognition" is applied to the (known) counselor's part, and "keyword spotting" is used on the (unknown) client's part to pick out specific items from the "word streams". The outcomes fill an abstract score representing the dialogue. Pilot tests performed on a prototype of SACA (Software Assisted Call Analysis) resulted in a basic proof of concept: Demographic data as well as information regarding the situation of the caller could be identified. The study encourages following up on the vision of an integrated SACA tool for supporting calls online and performing statistics on its knowledge database offline. Further research perspectives are to check SACA's potential in comparison with established interaction analysis systems like RIAS. Copyright (c) 2010 Elsevier Ireland Ltd. All rights reserved.
Fairweather, Glenn Craig; Lincoln, Michelle Ann; Ramsden, Robyn
2016-12-01
The objectives of this study were to investigate the efficacy of a speech-language pathology teletherapy program for children attending schools and early childcare settings in rural New South Wales, Australia, and their parents' views on the program's feasibility and acceptability. Nineteen children received speech-language pathology sessions delivered via Adobe Connect®, Facetime © or Skype © web-conferencing software. During semi-structured interviews, parents (n = 5) described factors that promoted or threatened the program's feasibility and acceptability. Participation in a speech-language pathology teletherapy program using low-bandwidth videoconferencing improved the speech and language skills of children in both early childhood settings and primary school. Emergent themes related to (a) practicality and convenience, (b) learning, (c) difficulties and (d) communication. Treatment outcome data and parental reports verified that the teletherapy service delivery was feasible and acceptable. However, it was also evident that regular discussion and communication between the various stakeholders involved in teletherapy programs may promote increased parental engagement and acceptability.
Gifford, René H; Davis, Timothy J; Sunderhaus, Linsey W; Menapace, Christine; Buck, Barbara; Crosson, Jillian; O'Neill, Lori; Beiter, Anne; Segel, Phil
The primary objective of this study was to assess the effect of electric and acoustic overlap for speech understanding in typical listening conditions using semidiffuse noise. This study used a within-subjects, repeated measures design including 11 experienced adult implant recipients (13 ears) with functional residual hearing in the implanted and nonimplanted ear. The aided acoustic bandwidth was fixed and the low-frequency cutoff for the cochlear implant (CI) was varied systematically. Assessments were completed in the R-SPACE sound-simulation system which includes a semidiffuse restaurant noise originating from eight loudspeakers placed circumferentially about the subject's head. AzBio sentences were presented at 67 dBA with signal to noise ratio varying between +10 and 0 dB determined individually to yield approximately 50 to 60% correct for the CI-alone condition with full CI bandwidth. Listening conditions for all subjects included CI alone, bimodal (CI + contralateral hearing aid), and bilateral-aided electric and acoustic stimulation (EAS; CI + bilateral hearing aid). Low-frequency cutoffs both below and above the original "clinical software recommendation" frequency were tested for all patients, in all conditions. Subjects estimated listening difficulty for all conditions using listener ratings based on a visual analog scale. Three primary findings were that (1) there was statistically significant benefit of preserved acoustic hearing in the implanted ear for most overlap conditions, (2) the default clinical software recommendation rarely yielded the highest level of speech recognition (1 of 13 ears), and (3) greater EAS overlap than that provided by the clinical recommendation yielded significant improvements in speech understanding. For standard-electrode CI recipients with preserved hearing, spectral overlap of acoustic and electric stimuli yielded significantly better speech understanding and less listening effort in a laboratory-based, restaurant-noise simulation. In conclusion, EAS patients may derive more benefit from greater acoustic and electric overlap than given in current software fitting recommendations, which are based solely on audiometric threshold. These data have larger scientific implications, as previous studies may not have assessed outcomes with optimized EAS parameters, thereby underestimating the benefit afforded by hearing preservation.
Dwivedi, Raghav C; St Rose, Suzanne; Chisholm, Edward J; Bisase, Brian; Amen, Furrat; Nutting, Christopher M; Clarke, Peter M; Kerawala, Cyrus J; Rhys-Evans, Peter H; Harrington, Kevin J; Kazi, Rehan
2012-06-01
The aim of this study was to explore post-treatment speech impairments using English version of Speech Handicap Index (SHI) (first speech-specific questionnaire) in a cohort of oral cavity (OC) and oropharyngeal (OP) cancer patients. Sixty-three consecutive OC and OP cancer patients in follow-up participated in this study. Descriptive analyses have been presented as percentages, while Mann-Whitney U-test and Kruskall-Wallis test have been used for the quantitative variables. Statistical Package for Social Science-15 statistical software (SPSS Inc., Chicago, IL) was used for the statistical analyses. Over a third (36.1%) of patients reported their speech as either average or bad. Speech intelligibility and articulation were the main speech concerns for 58.8% and 52.9% OC and 31.6% and 34.2% OP cancer patients, respectively. While feeling of incompetent and being less outgoing were the speech-related psychosocial concerns for 64.7% and 23.5% OC and 15.8% and 18.4% OP cancer patients, respectively. Worse speech outcomes were noted for oral tongue and base of tongue cancers vs. tonsillar cancers, mean (SD) values were 56.7 (31.3) and 52.0 (38.4) vs. 10.9 (14.8) (P<0.001) and late vs. early T stage cancers 65.0 (29.9) vs. 29.3 (32.7) (P<0.005). The English version of the SHI is a reliable, valid and useful tool for the evaluation of speech in HNC patients. Over one-third of OC and OP cancer patients reported speech problems in their day-do-day life. Advanced T-stage tumors affecting the oral tongue or base of tongue are particularly associated with poor speech outcomes. Copyright © 2012 Elsevier Ltd. All rights reserved.
Voice input/output capabilities at Perception Technology Corporation
NASA Technical Reports Server (NTRS)
Ferber, Leon A.
1977-01-01
Condensed resumes of key company personnel at the Perception Technology Corporation are presented. The staff possesses recognition, speech synthesis, speaker authentication, and language identification. Hardware and software engineers' capabilities are included.
NASA Astrophysics Data System (ADS)
Soderstrom, Ken; Alalawi, Ali
KLFromRecordingDays allows measurement of Kullback-Leibler (KL) distances between 2D probability distributions of vocal acoustic features. Greater KL distance measures reflect increased phonological divergence across the vocalizations compared. The software has been used to compare *.wav file recordings made by Sound Analysis Recorder 2011 of songbird vocalizations pre- and post-drug and surgical manipulations. Recordings from individual animals in *.wav format are first organized into subdirectories by recording day and then segmented into individual syllables uttered and acoustic features of these syllables using Sound Analysis Pro 2011 (SAP). KLFromRecordingDays uses syllable acoustic feature data output by SAP to a MySQL table to generate and compare "template" (typically pre-treatment) and "target" (typically post-treatment) probability distributions. These distributions are a series of virtual 2D plots of the duration of each syllable (as x-axis) to each of 13 other acoustic features measured by SAP for that syllable (as y-axes). Differences between "template" and "target" probability distributions for each acoustic feature are determined by calculating KL distance, a measure of divergence of the target 2D distribution pattern from that of the template. KL distances and the mean KL distance across all acoustic features are calculated for each recording day and output to an Excel spreadsheet. Resulting data for individual subjects may then be pooled across treatment groups and graphically summarized and used for statistical comparisons. Because SAP-generated MySQL files are accessed directly, data limits associated with spreadsheet output are avoided, and the totality of vocal output over weeks may be objectively analyzed all at once. The software has been useful for measuring drug effects on songbird vocalizations and assessing recovery from damage to regions of vocal motor cortex. It may be useful in studies employing other species, and as part of speech therapies tracking progress in producing distinct speech sounds in isolation.
Software use in the (re)habilitation of hearing impaired children.
Silva, Mariane Perin da; Comerlatto Junior, Ademir Antonio; Balen, Sheila Andreoli; Bevilacqua, Maria Cecília
2012-01-01
To verify the applicability of a software in the (re)habilitation of hearing impaired children. The sample comprised 17 children with hearing impairment, ten with cochlear implants (CI) and seven with hearing aids (HA). The Software Auxiliar na Reabilitação de Distúrbios Auditivos - SARDA (Auxiliary Software for the Rehabilitation of Hearing Disorders) was used. The training protocol was applied for 30 minutes, twice a week, for the necessary time to complete the strategies proposed in the software. To measure the software's applicability for training the speech perception ability in quiet and in noise, subjects were assessed through the Hearing in Noise Test (HINT), before and after the auditory training. Data were statistically analyzed. The group of CI users needed, in average, 12.2 days to finish the strategies, and the group of HA users, in average 10.14 days. Both groups presented differences between pre and post assessments, both in quiet and in noise. Younger children showed more difficulty executing the strategies, however, there was no correlation between age and performance. The type of electronic device did not influence the training. Children presented greater difficulty in the strategy involving non-verbal stimuli and in the strategy with verbal stimuli that trains the sustained attention ability. Children's attention and motivation during stimulation were fundamental for a successful auditory training. The auditory training using the SARDA was effective, providing improvement of the speech perception ability, both in quiet and in noise, for the hearing impaired children.
Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis
ERIC Educational Resources Information Center
Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.
2009-01-01
Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…
Woynaroski, Tiffany; Oller, D Kimbrough; Keceli-Kaysili, Bahar; Xu, Dongxin; Richards, Jeffrey A; Gilkerson, Jill; Gray, Sharmistha; Yoder, Paul
2017-03-01
Theory and research suggest that vocal development predicts "useful speech" in preschoolers with autism spectrum disorder (ASD), but conventional methods for measurement of vocal development are costly and time consuming. This longitudinal correlational study examines the reliability and validity of several automated indices of vocalization development relative to an index derived from human coded, conventional communication samples in a sample of preverbal preschoolers with ASD. Automated indices of vocal development were derived using software that is presently "in development" and/or only available for research purposes and using commercially available Language ENvironment Analysis (LENA) software. Indices of vocal development that could be derived using the software available for research purposes: (a) were highly stable with a single day-long audio recording, (b) predicted future spoken vocabulary to a degree that was nonsignificantly different from the index derived from conventional communication samples, and (c) continued to predict future spoken vocabulary even after controlling for concurrent vocabulary in our sample. The score derived from standard LENA software was similarly stable, but was not significantly correlated with future spoken vocabulary. Findings suggest that automated vocal analysis is a valid and reliable alternative to time intensive and expensive conventional communication samples for measurement of vocal development of preverbal preschoolers with ASD in research and clinical practice. Autism Res 2017, 10: 508-519. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
Begault, D. R.; Wenzel, E. M.; Anderson, M. R.
2001-01-01
A study of sound localization performance was conducted using headphone-delivered virtual speech stimuli, rendered via HRTF-based acoustic auralization software and hardware, and blocked-meatus HRTF measurements. The independent variables were chosen to evaluate commonly held assumptions in the literature regarding improved localization: inclusion of head tracking, individualized HRTFs, and early and diffuse reflections. Significant effects were found for azimuth and elevation error, reversal rates, and externalization.
Legal decision-making by people with aphasia: critical incidents for speech pathologists.
Ferguson, Alison; Duffield, Gemma; Worrall, Linda
2010-01-01
The assessment and management of a person with aphasia for whom decision-making capacity is queried represents a highly complex clinical issue. In addition, there are few published guidelines and even fewer published accounts of empirical research to assist. The research presented in this paper aimed to identify the main issues for speech pathologists when decision-making capacity for legal and related matters arose for their clients with aphasia, and to describe qualitatively the nature of these issues and the practices of the speech pathologists in these situations. The methodology was informed by the qualitative research paradigm and made use of the semi-structured interview methods developed for the Critical Incident Technique. Nine speech pathologists, with a range of clinical experience between three and 27 years, were interviewed by telephone, with verbatim notes being taken on-line by the interviewer. The speech pathologists described a total of 21 clients (15 male, six female) with acquired neurological communication disorders (including cerebral vascular accident, traumatic brain injury, and tumour) whose care had raised critical incidents for the speech pathologist in relation to legal and related matters. These verbatim notes were qualitatively analysed using NVivo qualitative analysis software. The main incidents related to legal decisions (for example, power of attorney, will-making), as well as decisions involving consent for medical treatment, discharge, accommodation, and business/financial decisions. In all but one of the incidents recounted, the issues centred on a situation of conflict between the person with aphasia and their family, friends or with the multidisciplinary team. The roles taken by the speech pathologists ranged from those expected within a speech pathology scope of practice, such as that of assessor and consultant, to those which arguably present dilemmas and conflict of interest, for example, interpreter, advocate. The assessment practices involved some standardized testing, but this was stressed by all participants to be of lesser importance than informal observations of function. Speech pathologists emphasized the importance of multiple observations, and multimodal means of communication. The findings indicate that speech pathologists are currently playing an active role when questions arise regarding capacity for legal and related decision-making by people with aphasia. At the same time, the findings support the need for further research to develop guidelines for practice and to build educational experiences for students and novice clinicians to assist them when they engage with the complex case management issues in this area. 2010 Royal College of Speech & Language Therapists.
Fu, Q Y; Liang, Y; Zou, A; Wang, T; Zhao, X D; Wan, J
2016-04-07
To investigate the relationships between electrophysiological characteristic of speech evoked auditory brainstem response(s-ABR) and Mandarin phonetically balanced maximum(PBmax) at different hearing impairment, so as to provide more clues for the mechanism of speech cognitive behavior. Forty-one ears in 41 normal hearing adults(NH), thirty ears in 30 conductive hearing loss patients(CHL) and twenty-seven ears in 27 sensorineural hearing loss patients(SNHL) were included in present study. The speech discrimination scores were obtained by Mandarin phonemic-balanced monosyllable lists via speech audiometric software. Their s-ABRs were recorded with speech syllables /da/ with the intensity of phonetically balanced maximum(PBmax). The electrophysiological characteristic of s-ABR, as well as the relationships between PBmax and s-ABR parameters including latency in time domain, fundamental frequency(F0) and first formant(F1) in frequency domain were analyzed statistically. All subjects completed good speech perception tests and PBmax of CHL and SNHL had no significant difference (P>0.05), but both significantly less than that of NH (P<0.05). While divided the subjects into three groups by 90%
Delphi, Maryam; Lotfi, M-Yones; Moossavi, Abdollah; Bakhshi, Enayatollah; Banimostafa, Maryam
2017-09-01
Previous studies have shown that interaural-time-difference (ITD) training can improve localization ability. Surprisingly little is, however, known about localization training vis-à-vis speech perception in noise based on interaural time difference in the envelope (ITD ENV). We sought to investigate the reliability of an ITD ENV-based training program in speech-in-noise perception among elderly individuals with normal hearing and speech-in-noise disorder. The present interventional study was performed during 2016. Sixteen elderly men between 55 and 65 years of age with the clinical diagnosis of normal hearing up to 2000 Hz and speech-in-noise perception disorder participated in this study. The training localization program was based on changes in ITD ENV. In order to evaluate the reliability of the training program, we performed speech-in-noise tests before the training program, immediately afterward, and then at 2 months' follow-up. The reliability of the training program was analyzed using the Friedman test and the SPSS software. Significant statistical differences were shown in the mean scores of speech-in-noise perception between the 3 time points (P=0.001). The results also indicated no difference in the mean scores of speech-in-noise perception between the 2 time points of immediately after the training program and 2 months' follow-up (P=0.212). The present study showed the reliability of an ITD ENV-based localization training in elderly individuals with speech-in-noise perception disorder.
Emerging technologies with potential for objectively evaluating speech recognition skills.
Rawool, Vishakha Waman
2016-01-01
Work-related exposure to noise and other ototoxins can cause damage to the cochlea, synapses between the inner hair cells, the auditory nerve fibers, and higher auditory pathways, leading to difficulties in recognizing speech. Procedures designed to determine speech recognition scores (SRS) in an objective manner can be helpful in disability compensation cases where the worker claims to have poor speech perception due to exposure to noise or ototoxins. Such measures can also be helpful in determining SRS in individuals who cannot provide reliable responses to speech stimuli, including patients with Alzheimer's disease, traumatic brain injuries, and infants with and without hearing loss. Cost-effective neural monitoring hardware and software is being rapidly refined due to the high demand for neurogaming (games involving the use of brain-computer interfaces), health, and other applications. More specifically, two related advances in neuro-technology include relative ease in recording neural activity and availability of sophisticated analysing techniques. These techniques are reviewed in the current article and their applications for developing objective SRS procedures are proposed. Issues related to neuroaudioethics (ethics related to collection of neural data evoked by auditory stimuli including speech) and neurosecurity (preservation of a person's neural mechanisms and free will) are also discussed.
Real-time classification of auditory sentences using evoked cortical activity in humans
NASA Astrophysics Data System (ADS)
Moses, David A.; Leonard, Matthew K.; Chang, Edward F.
2018-06-01
Objective. Recent research has characterized the anatomical and functional basis of speech perception in the human auditory cortex. These advances have made it possible to decode speech information from activity in brain regions like the superior temporal gyrus, but no published work has demonstrated this ability in real-time, which is necessary for neuroprosthetic brain-computer interfaces. Approach. Here, we introduce a real-time neural speech recognition (rtNSR) software package, which was used to classify spoken input from high-resolution electrocorticography signals in real-time. We tested the system with two human subjects implanted with electrode arrays over the lateral brain surface. Subjects listened to multiple repetitions of ten sentences, and rtNSR classified what was heard in real-time from neural activity patterns using direct sentence-level and HMM-based phoneme-level classification schemes. Main results. We observed single-trial sentence classification accuracies of 90% or higher for each subject with less than 7 minutes of training data, demonstrating the ability of rtNSR to use cortical recordings to perform accurate real-time speech decoding in a limited vocabulary setting. Significance. Further development and testing of the package with different speech paradigms could influence the design of future speech neuroprosthetic applications.
ERIC Educational Resources Information Center
Hussmann, Katja; Grande, Marion; Meffert, Elisabeth; Christoph, Swetlana; Piefke, Martina; Willmes, Klaus; Huber, Walter
2012-01-01
Although generally accepted as an important part of aphasia assessment, detailed analysis of spontaneous speech is rarely carried out in clinical practice mostly due to time limitations. The Aachener Sprachanalyse (ASPA; Aachen Speech Analysis) is a computer-assisted method for the quantitative analysis of German spontaneous speech that allows for…
The Interpersonal Metafunction Analysis of Barack Obama's Victory Speech
ERIC Educational Resources Information Center
Ye, Ruijuan
2010-01-01
This paper carries on a tentative interpersonal metafunction analysis of Barack Obama's victory speech from the interpersonal metafunction, which aims to help readers understand and evaluate the speech regarding its suitability, thus to provide some guidance for readers to make better speeches. This study has promising implications for speeches as…
An integrated tool for the diagnosis of voice disorders.
Godino-Llorente, Juan I; Sáenz-Lechón, Nicolás; Osma-Ruiz, Víctor; Aguilera-Navarro, Santiago; Gómez-Vilda, Pedro
2006-04-01
A PC-based integrated aid tool has been developed for the analysis and screening of pathological voices. With it the user can simultaneously record speech, electroglottographic (EGG), and videoendoscopic signals, and synchronously edit them to select the most significant segments. These multimedia data are stored on a relational database, together with a patient's personal information, anamnesis, diagnosis, visits, explorations and any other comment the specialist may wish to include. The speech and EGG waveforms are analysed by means of temporal representations and the quantitative measurements of parameters such as spectrograms, frequency and amplitude perturbation measurements, harmonic energy, noise, etc. are calculated using digital signal processing techniques, giving an idea of the degree of hoarseness and quality of the voice register. Within this framework, the system uses a standard protocol to evaluate and build complete databases of voice disorders. The target users of this system are speech and language therapists and ear nose and throat (ENT) clinicians. The application can be easily configured to cover the needs of both groups of professionals. The software has a user-friendly Windows style interface. The PC should be equipped with standard sound and video capture cards. Signals are captured using common transducers: a microphone, an electroglottograph and a fiberscope or telelaryngoscope. The clinical usefulness of the system is addressed in a comprehensive evaluation section.
Speech and language disorders in children from public schools in Belo Horizonte
Rabelo, Alessandra Terra Vasconcelos; Campos, Fernanda Rodrigues; Friche, Clarice Passos; da Silva, Bárbara Suelen Vasconcelos; Friche, Amélia Augusta de Lima; Alves, Claudia Regina Lindgren; Goulart, Lúcia Maria Horta de Figueiredo
2015-01-01
Objective: To investigate the prevalence of oral language, orofacial motor skill and auditory processing disorders in children aged 4-10 years and verify their association with age and gender. Methods: Cross-sectional study with stratified, random sample consisting of 539 students. The evaluation consisted of three protocols: orofacial motor skill protocol, adapted from the Myofunctional Evaluation Guidelines; the Child Language Test ABFW - Phonology; and a simplified auditory processing evaluation. Descriptive and associative statistical analyses were performed using Epi Info software, release 6.04. Chi-square test was applied to compare proportion of events and analysis of variance was used to compare mean values. Significance was set at p≤0.05. Results: Of the studied subjects, 50.1% had at least one of the assessed disorders; of those, 33.6% had oral language disorder, 17.1% had orofacial motor skill impairment, and 27.3% had auditory processing disorder. There were significant associations between auditory processing skills’ impairment, oral language impairment and age, suggesting a decrease in the number of disorders with increasing age. Similarly, the variable "one or more speech, language and hearing disorders" was also associated with age. Conclusions: The prevalence of speech, language and hearing disorders in children was high, indicating the need for research and public health efforts to cope with this problem. PMID:26300524
Narayanan, Shrikanth; Toutios, Asterios; Ramanarayanan, Vikram; Lammert, Adam; Kim, Jangwon; Lee, Sungbok; Nayak, Krishna; Kim, Yoon-Chul; Zhu, Yinghua; Goldstein, Louis; Byrd, Dani; Bresch, Erik; Ghosh, Prasanta; Katsamanis, Athanasios; Proctor, Michael
2014-01-01
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English. Electromagnetic articulography data have also been presently collected from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus used previously in the MOCHA-TIMIT database. In both cases the audio signal was recorded and synchronized with the articulatory data. The database and companion software are freely available to the research community. PMID:25190403
[Intranet applications in radiology].
Knopp, M V; von Hippel, G M; Koch, T; Knopp, M A
2000-01-01
The aim of the paper is to present the conceptual basis and capabilities of intranet applications in radiology. The intranet, which is the local brother of the internet can be readily realized using existing computer components and a network. All current computer operating systems support intranet applications which allow hard and software independent communication of text, images, video and sound with the use of browser software without dedicated programs on the individual personal computers. Radiological applications for text communication e.g. department specific bulletin boards and access to examination protocols; use of image communication for viewing and limited processing and documentation of radiological images can be achieved on decentralized PCs as well as speech communication for dictation, distribution of dictation and speech recognition. The intranet helps to optimize the organizational efficiency and cost effectiveness in the daily work of radiological departments in outpatients and hospital settings. The general interest in internet and intranet technology will guarantee its continuous development.
Non-fluent speech following stroke is caused by impaired efference copy.
Feenaughty, Lynda; Basilakos, Alexandra; Bonilha, Leonardo; den Ouden, Dirk-Bart; Rorden, Chris; Stark, Brielle; Fridriksson, Julius
2017-09-01
Efference copy is a cognitive mechanism argued to be critical for initiating and monitoring speech: however, the extent to which breakdown of efference copy mechanisms impact speech production is unclear. This study examined the best mechanistic predictors of non-fluent speech among 88 stroke survivors. Objective speech fluency measures were subjected to a principal component analysis (PCA). The primary PCA factor was then entered into a multiple stepwise linear regression analysis as the dependent variable, with a set of independent mechanistic variables. Participants' ability to mimic audio-visual speech ("speech entrainment response") was the best independent predictor of non-fluent speech. We suggest that this "speech entrainment" factor reflects integrity of internal monitoring (i.e., efference copy) of speech production, which affects speech initiation and maintenance. Results support models of normal speech production and suggest that therapy focused on speech initiation and maintenance may improve speech fluency for individuals with chronic non-fluent aphasia post stroke.
Processing Electromyographic Signals to Recognize Words
NASA Technical Reports Server (NTRS)
Jorgensen, C. C.; Lee, D. D.
2009-01-01
A recently invented speech-recognition method applies to words that are articulated by means of the tongue and throat muscles but are otherwise not voiced or, at most, are spoken sotto voce. This method could satisfy a need for speech recognition under circumstances in which normal audible speech is difficult, poses a hazard, is disturbing to listeners, or compromises privacy. The method could also be used to augment traditional speech recognition by providing an additional source of information about articulator activity. The method can be characterized as intermediate between (1) conventional speech recognition through processing of voice sounds and (2) a method, not yet developed, of processing electroencephalographic signals to extract unspoken words directly from thoughts. This method involves computational processing of digitized electromyographic (EMG) signals from muscle innervation acquired by surface electrodes under a subject's chin near the tongue and on the side of the subject s throat near the larynx. After preprocessing, digitization, and feature extraction, EMG signals are processed by a neural-network pattern classifier, implemented in software, that performs the bulk of the recognition task as described.
Infants’ brain responses to speech suggest Analysis by Synthesis
Kuhl, Patricia K.; Ramírez, Rey R.; Bosseler, Alexis; Lin, Jo-Fu Lotus; Imada, Toshiaki
2014-01-01
Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners’ knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca’s area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of “motherese” on early language learning, and (iii) the “social-gating” hypothesis and humans’ development of social understanding. PMID:25024207
Infants' brain responses to speech suggest analysis by synthesis.
Kuhl, Patricia K; Ramírez, Rey R; Bosseler, Alexis; Lin, Jo-Fu Lotus; Imada, Toshiaki
2014-08-05
Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners' knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca's area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of "motherese" on early language learning, and (iii) the "social-gating" hypothesis and humans' development of social understanding.
Gottschalk, Louis A; DeFrancisco, Don; Bechtel, Robert J
2002-08-01
The aim of this study was to test the validity of a computer software program previously demonstrated to be capable of making DSM-IV neuropsychiatric diagnoses from the content analysis of speech or verbal texts. In this report, the computer program was applied to three personal writings of Napoleon Bonaparte when he was 12 to 16 years of age. The accuracy of the neuropsychiatric evaluations derived from the computerized content analysis of these writings of Napoleon was independently corroborated by two biographers who have described pertinent details concerning his life situations, moods, and other emotional reactions during this adolescent period of his life. The relevance of this type of computer technology to psychohistorical research and clinical psychiatry is suggested.
ERIC Educational Resources Information Center
Tierney, Joseph; Mack, Molly
1987-01-01
Stimuli used in research on the perception of the speech signal have often been obtained from simple filtering and distortion of the speech waveform, sometimes accompanied by noise. However, for more complex stimulus generation, the parameters of speech can be manipulated, after analysis and before synthesis, using various types of algorithms to…
CACTI: Free, Open-Source Software for the Sequential Coding of Behavioral Interactions
Glynn, Lisa H.; Hallgren, Kevin A.; Houck, Jon M.; Moyers, Theresa B.
2012-01-01
The sequential analysis of client and clinician speech in psychotherapy sessions can help to identify and characterize potential mechanisms of treatment and behavior change. Previous studies required coding systems that were time-consuming, expensive, and error-prone. Existing software can be expensive and inflexible, and furthermore, no single package allows for pre-parsing, sequential coding, and assignment of global ratings. We developed a free, open-source, and adaptable program to meet these needs: The CASAA Application for Coding Treatment Interactions (CACTI). Without transcripts, CACTI facilitates the real-time sequential coding of behavioral interactions using WAV-format audio files. Most elements of the interface are user-modifiable through a simple XML file, and can be further adapted using Java through the terms of the GNU Public License. Coding with this software yields interrater reliabilities comparable to previous methods, but at greatly reduced time and expense. CACTI is a flexible research tool that can simplify psychotherapy process research, and has the potential to contribute to the improvement of treatment content and delivery. PMID:22815713
Automatic concept extraction from spoken medical reports.
Happe, André; Pouliquen, Bruno; Burgun, Anita; Cuggia, Marc; Le Beux, Pierre
2003-07-01
The objective of this project is to investigate methods whereby a combination of speech recognition and automated indexing methods substitute for current transcription and indexing practices. We based our study on existing speech recognition software programs and on NOMINDEX, a tool that extracts MeSH concepts from medical text in natural language and that is mainly based on a French medical lexicon and on the UMLS. For each document, the process consists of three steps: (1) dictation and digital audio recording, (2) speech recognition, (3) automatic indexing. The evaluation consisted of a comparison between the set of concepts extracted by NOMINDEX after the speech recognition phase and the set of keywords manually extracted from the initial document. The method was evaluated on a set of 28 patient discharge summaries extracted from the MENELAS corpus in French, corresponding to in-patients admitted for coronarography. The overall precision was 73% and the overall recall was 90%. Indexing errors were mainly due to word sense ambiguity and abbreviations. A specific issue was the fact that the standard French translation of MeSH terms lacks diacritics. A preliminary evaluation of speech recognition tools showed that the rate of accurate recognition was higher than 98%. Only 3% of the indexing errors were generated by inadequate speech recognition. We discuss several areas to focus on to improve this prototype. However, the very low rate of indexing errors due to speech recognition errors highlights the potential benefits of combining speech recognition techniques and automatic indexing.
NASA Technical Reports Server (NTRS)
Shoham, Yoav
1994-01-01
The goal of our research is a methodology for creating robust software in distributed and dynamic environments. The approach taken is to endow software objects with explicit information about one another, to have them interact through a commitment mechanism, and to equip them with a speech-acty communication language. System-level applications include software interoperation and compositionality. A government application of specific interest is an infrastructure for coordination among multiple planners. Daily activity applications include personal software assistants, such as programmable email, scheduling, and new group agents. Research topics include definition of mental state of agents, design of agent languages as well as interpreters for those languages, and mechanisms for coordination within agent societies such as artificial social laws and conventions.
Restoring the missing features of the corrupted speech using linear interpolation methods
NASA Astrophysics Data System (ADS)
Rassem, Taha H.; Makbol, Nasrin M.; Hasan, Ali Muttaleb; Zaki, Siti Syazni Mohd; Girija, P. N.
2017-10-01
One of the main challenges in the Automatic Speech Recognition (ASR) is the noise. The performance of the ASR system reduces significantly if the speech is corrupted by noise. In spectrogram representation of a speech signal, after deleting low Signal to Noise Ratio (SNR) elements, the incomplete spectrogram is obtained. In this case, the speech recognizer should make modifications to the spectrogram in order to restore the missing elements, which is one direction. In another direction, speech recognizer should be able to restore the missing elements due to deleting low SNR elements before performing the recognition. This is can be done using different spectrogram reconstruction methods. In this paper, the geometrical spectrogram reconstruction methods suggested by some researchers are implemented as a toolbox. In these geometrical reconstruction methods, the linear interpolation along time or frequency methods are used to predict the missing elements between adjacent observed elements in the spectrogram. Moreover, a new linear interpolation method using time and frequency together is presented. The CMU Sphinx III software is used in the experiments to test the performance of the linear interpolation reconstruction method. The experiments are done under different conditions such as different lengths of the window and different lengths of utterances. Speech corpus consists of 20 males and 20 females; each one has two different utterances are used in the experiments. As a result, 80% recognition accuracy is achieved with 25% SNR ratio.
Comparison of speech performance in labial and lingual orthodontic patients: A prospective study
Rai, Ambesh Kumar; Rozario, Joe E.; Ganeshkar, Sanjay V.
2014-01-01
Background: The intensity and duration of speech difficulty inherently associated with lingual therapy is a significant issue of concern in orthodontics. This study was designed to evaluate and to compare the duration of changes in speech between labial and lingual orthodontics. Materials and Methods: A prospective longitudinal clinical study was designed to assess speech of 24 patients undergoing labial or lingual orthodontic treatment. An objective spectrographic evaluation of/s/sound was done using software PRAAT version 5.0.47, a semiobjective auditive evaluation of articulation was done by four speech pathologists and a subjective assessment of speech was done by four laypersons. The tests were performed before (T1), within 24 h (T2), after 1 week (T3) and after 1 month (T4) of the start of therapy. The Mann-Whitney U-test for independent samples was used to assess the significance difference between the labial and lingual appliances. A speech alteration with P < 0.05 was considered to be significant. Results: The objective method showed a significant difference to be present between the two groups for the/s/sound in the middle position (P < 0.001) at T3. The semiobjective assessment showed worst speech performance in the lingual group to be present at T3 for vowels and blends (P < 0.01) and at T3 and T4 for alveolar and palatal consonants (P < 0.01). The subjective assessment also showed a significant difference between the two groups at T3 (P < 0.01) and T4 (P < 0.05). Conclusion: Both appliance systems caused a comparable speech difficulty immediately after bonding (T2). Although the speech recovered within a week in the labial group (T3), the lingual group continued to experience discomfort even after a month (T4). PMID:25540661
2015-03-31
analysis. For scene analysis, we use Temporal Data Crystallization (TDC), and for logical analysis, we use Speech Act theory and Toulmin Argumentation...utterance in the discussion record. (i) An utterance ID, and a speaker ID (ii) Speech acts (iii) Argument structure Speech act denotes...mediator is expected to use more OQs than CQs. When the speech act of an utterance is an argument, furthermore, we recognize the conclusion part
Use of speech-to-text technology for documentation by healthcare providers.
Ajami, Sima
2016-01-01
Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.
Speech to Text Translation for Malay Language
NASA Astrophysics Data System (ADS)
Al-khulaidi, Rami Ali; Akmeliawati, Rini
2017-11-01
The speech recognition system is a front end and a back-end process that receives an audio signal uttered by a speaker and converts it into a text transcription. The speech system can be used in several fields including: therapeutic technology, education, social robotics and computer entertainments. In most cases in control tasks, which is the purpose of proposing our system, wherein the speed of performance and response concern as the system should integrate with other controlling platforms such as in voiced controlled robots. Therefore, the need for flexible platforms, that can be easily edited to jibe with functionality of the surroundings, came to the scene; unlike other software programs that require recording audios and multiple training for every entry such as MATLAB and Phoenix. In this paper, a speech recognition system for Malay language is implemented using Microsoft Visual Studio C#. 90 (ninety) Malay phrases were tested by 10 (ten) speakers from both genders in different contexts. The result shows that the overall accuracy (calculated from Confusion Matrix) is satisfactory as it is 92.69%.
Huo, Xueliang; Park, Hangue; Kim, Jeonghee; Ghovanloo, Maysam
2015-01-01
We are presenting a new wireless and wearable human computer interface called the dual-mode Tongue Drive System (dTDS), which is designed to allow people with severe disabilities to use computers more effectively with increased speed, flexibility, usability, and independence through their tongue motion and speech. The dTDS detects users’ tongue motion using a magnetic tracer and an array of magnetic sensors embedded in a compact and ergonomic wireless headset. It also captures the users’ voice wirelessly using a small microphone embedded in the same headset. Preliminary evaluation results based on 14 able-bodied subjects and three individuals with high level spinal cord injuries at level C3–C5 indicated that the dTDS headset, combined with a commercially available speech recognition (SR) software, can provide end users with significantly higher performance than either unimodal forms based on the tongue motion or speech alone, particularly in completing tasks that require both pointing and text entry. PMID:23475380
It's a sentence, not a word: insights from a keyword analysis in cancer communication.
Taylor, Kimberly; Thorne, Sally; Oliffe, John L
2015-01-01
Keyword analysis has been championed as a methodological option for expanding the insights that can be extracted from qualitative datasets using various properties available in qualitative software. Intrigued by the pioneering applications of Clive Seale and his colleagues in this regard, we conducted keyword analyses for word frequency and "keyness" on a qualitative database of interview transcripts from a study on cancer communication. We then subjected the results from these operations to an in-depth contextual inquiry by resituating word instances within their original speech contexts, finding that most of what had initially appeared as group variations broke down under close analysis. In this article, we illustrate the various threads of analysis, and explain how they unraveled under closer scrutiny. On the basis of this tentative exercise, we conclude that a healthy skepticism for the benefits of keyword analysis within a qualitative investigative process seems warranted. © The Author(s) 2014.
Lackey, Amanda E; Pandey, Tarun; Moshiri, Mariam; Lalwani, Neeraj; Lall, Chandana; Bhargava, Puneet
2014-06-01
It is an opportune time for radiologists to focus on personal productivity. The ever increasing reliance on computers and the Internet has significantly changed the way we work. Myriad software applications are available to help us improve our personal efficiency. In this article, the authors discuss some tools that help improve collaboration and personal productivity, maximize e-learning, and protect valuable digital data. Published by Elsevier Inc.
Multilevel Analysis in Analyzing Speech Data
ERIC Educational Resources Information Center
Guddattu, Vasudeva; Krishna, Y.
2011-01-01
The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…
A multimodal spectral approach to characterize rhythm in natural speech.
Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta
2016-01-01
Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech.
A Smartphone Application for Customized Frequency Table Selection in Cochlear Implants.
Jethanamest, Daniel; Azadpour, Mahan; Zeman, Annette M; Sagi, Elad; Svirsky, Mario A
2017-09-01
A novel smartphone-based software application can facilitate self-selection of frequency allocation tables (FAT) in postlingually deaf cochlear implant (CI) users. CIs use FATs to represent the tonotopic organization of a normal cochlea. Current CI fitting methods typically use a standard FAT for all patients regardless of individual differences in cochlear size and electrode location. In postlingually deaf patients, different amounts of mismatch can result between the frequency-place function they experienced when they had normal hearing and the frequency-place function that results from the standard FAT. For some CI users, an alternative FAT may enhance sound quality or speech perception. Currently, no widely available tools exist to aid real-time selection of different FATs. This study aims to develop a new smartphone tool for this purpose and to evaluate speech perception and sound quality measures in a pilot study of CI subjects using this application. A smartphone application for a widely available mobile platform (iOS) was developed to serve as a preprocessor of auditory input to a clinical CI speech processor and enable interactive real-time selection of FATs. The application's output was validated by measuring electrodograms for various inputs. A pilot study was conducted in six CI subjects. Speech perception was evaluated using word recognition tests. All subjects successfully used the portable application with their clinical speech processors to experience different FATs while listening to running speech. The users were all able to select one table that they judged provided the best sound quality. All subjects chose a FAT different from the standard FAT in their everyday clinical processor. Using the smartphone application, the mean consonant-nucleus-consonant score with the default FAT selection was 28.5% (SD 16.8) and 29.5% (SD 16.4) when using a self-selected FAT. A portable smartphone application enables CI users to self-select frequency allocation tables in real time. Even though the self-selected FATs that were deemed to have better sound quality were only tested acutely (i.e., without long-term experience with them), speech perception scores were not inferior to those obtained with the clinical FATs. This software application may be a valuable tool for improving future methods of CI fitting.
ERIC Educational Resources Information Center
Jones, Tom; Di Salvo, Vince
A computerized content analysis of the "theory input" for a basic speech course was conducted. The questions to be answered were (1) What does the inexperienced basic speech student hold as a conceptual perspective of the "speech to inform" prior to his being subjected to a college speech class? and (2) How does that inexperienced student's…
ERIC Educational Resources Information Center
Adank, Patti
2012-01-01
The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…
Using Discursis to enhance the qualitative analysis of hospital pharmacist-patient interactions.
Chevalier, Bernadette A M; Watson, Bernadette M; Barras, Michael A; Cottrell, William N; Angus, Daniel J
2018-01-01
Pharmacist-patient communication during medication counselling has been successfully investigated using Communication Accommodation Theory (CAT). Communication researchers in other healthcare professions have utilised Discursis software as an adjunct to their manual qualitative analysis processes. Discursis provides a visual, chronological representation of communication exchanges and identifies patterns of interactant engagement. The aim of this study was to describe how Discursis software was used to enhance previously conducted qualitative analysis of pharmacist-patient interactions (by visualising pharmacist-patient speech patterns, episodes of engagement, and identifying CAT strategies employed by pharmacists within these episodes). Visual plots from 48 transcribed audio recordings of pharmacist-patient exchanges were generated by Discursis. Representative plots were selected to show moderate-high and low- level speaker engagement. Details of engagement were investigated for pharmacist application of CAT strategies (approximation, interpretability, discourse management, emotional expression, and interpersonal control). Discursis plots allowed for identification of distinct patterns occurring within pharmacist-patient exchanges. Moderate-high pharmacist-patient engagement was characterised by multiple off-diagonal squares while alternating single coloured squares depicted low engagement. Engagement episodes were associated with multiple CAT strategies such as discourse management (open-ended questions). Patterns reflecting pharmacist or patient speaker dominance were dependant on clinical setting. Discursis analysis of pharmacist-patient interactions, a novel application of the technology in health communication, was found to be an effective visualisation tool to pin-point episodes for CAT analysis. Discursis has numerous practical and theoretical applications for future health communication research and training. Researchers can use the software to support qualitative analysis where large data sets can be quickly reviewed to identify key areas for concentrated analysis. Because Discursis plots are easily generated from audio recorded transcripts, they are conducive as teaching tools for both students and practitioners to assess and develop their communication skills.
Perceptual Learning and Auditory Training in Cochlear Implant Recipients
Fu, Qian-Jie; Galvin, John J.
2007-01-01
Learning electrically stimulated speech patterns can be a new and difficult experience for cochlear implant (CI) recipients. Recent studies have shown that most implant recipients at least partially adapt to these new patterns via passive, daily-listening experiences. Gradually introducing a speech processor parameter (eg, the degree of spectral mismatch) may provide for more complete and less stressful adaptation. Although the implant device restores hearing sensation and the continued use of the implant provides some degree of adaptation, active auditory rehabilitation may be necessary to maximize the benefit of implantation for CI recipients. Currently, there are scant resources for auditory rehabilitation for adult, postlingually deafened CI recipients. We recently developed a computer-assisted speech-training program to provide the means to conduct auditory rehabilitation at home. The training software targets important acoustic contrasts among speech stimuli, provides auditory and visual feedback, and incorporates progressive training techniques, thereby maintaining recipients’ interest during the auditory training exercises. Our recent studies demonstrate the effectiveness of targeted auditory training in improving CI recipients’ speech and music perception. Provided with an inexpensive and effective auditory training program, CI recipients may find the motivation and momentum to get the most from the implant device. PMID:17709574
[Speech and language disorders in children from public schools in Belo Horizonte].
Rabelo, Alessandra Terra Vasconcelos; Campos, Fernanda Rodrigues; Friche, Clarice Passos; da Silva, Bárbara Suelen Vasconcelos; de Lima Friche, Amélia Augusta; Alves, Claudia Regina Lindgren; de Figueiredo Goulart, Lúcia Maria Horta
2015-12-01
To investigate the prevalence of oral language, orofacial motor skill and auditory processing disorders in children aged 4-10 years old and verify their association with age and gender. Cross-sectional study with stratified, random sample consisting of 539 students. The evaluation consisted of three protocols: orofacial motor skill protocol, adapted from the Myofunctional Evaluation Guidelines; the Child Language Test ABFW--Phonology, and a simplified auditory processing evaluation. Descriptive and associative statistical analyses were performed using Epi Info software, release 6.04. Chi-square test was applied to compare proportion of events and analysis of variance was used to compare mean values. Significance was set at p≤0.05. Of the studied subjects, 50.1% had at least one of the assessed disorders; of those, 33.6% had oral language disorder, 17.1%, had orofacial motor skill impairment, and 27.3% had auditory processing disorder. There were significant associations between auditory processing skills' impairment, oral language impairment and age, suggesting a decrease in the number of disorders with increasing age. Similarly, the variable "one or more speech, language and hearing disorders" was also associated with age. The prevalence of speech, language and hearing disorders in children was high, indicating the need for research and public health efforts to cope with this problem. Copyright © 2015 Sociedade de Pediatria de São Paulo. Publicado por Elsevier Editora Ltda. All rights reserved.
Assessment of vocal cord nodules: a case study in speech processing by using Hilbert-Huang Transform
NASA Astrophysics Data System (ADS)
Civera, M.; Filosi, C. M.; Pugno, N. M.; Silvestrini, M.; Surace, C.; Worden, K.
2017-05-01
Vocal cord nodules represent a pathological condition for which the growth of unnatural masses on vocal folds affects the patients. Among other effects, changes in the vocal cords’ overall mass and stiffness alter their vibratory behaviour, thus changing the vocal emission generated by them. This causes dysphonia, i.e. abnormalities in the patients’ voice, which can be analysed and inspected via audio signals. However, the evaluation of voice condition through speech processing is not a trivial task, as standard methods based on the Fourier Transform, fail to fit the non-stationary nature of vocal signals. In this study, four audio tracks, provided by a volunteer patient, whose vocal fold nodules have been surgically removed, were analysed using a relatively new technique: the Hilbert-Huang Transform (HHT) via Empirical Mode Decomposition (EMD); specifically, by using the CEEMDAN (Complete Ensemble EMD with Adaptive Noise) algorithm. This method has been applied here to speech signals, which were recorded before removal surgery and during convalescence, to investigate specific trends. Possibilities offered by the HHT are exposed, but also some limitations of decomposing the signals into so-called intrinsic mode functions (IMFs) are highlighted. The results of these preliminary studies are intended to be a basis for the development of new viable alternatives to the softwares currently used for the analysis and evaluation of pathological voice.
Integrating voice evaluation: correlation between acoustic and audio-perceptual measures.
Vaz Freitas, Susana; Melo Pestana, Pedro; Almeida, Vítor; Ferreira, Aníbal
2015-05-01
This article aims to establish correlations between acoustic and audio-perceptual measures using the GRBAS scale with respect to four different voice analysis software programs. Exploratory, transversal. A total of 90 voice records were collected and analyzed with the Dr. Speech (Tiger Electronics, Seattle, WA), Multidimensional Voice Program (Kay Elemetrics, NJ, USA), PRAAT (University of Amsterdam, The Netherlands), and Voice Studio (Seegnal, Oporto, Portugal) software programs. The acoustic measures were correlated to the audio-perceptual parameters of the GRBAS and rated by 10 experts. The predictive value of the acoustic measurements related to the audio-perceptual parameters exhibited magnitudes ranging from weak (R(2)a=0.17) to moderate (R(2)a=0.71). The parameter exhibiting the highest correlation magnitude is B (Breathiness), whereas the weaker correlation magnitudes were found to be for A (Asthenia) and S (Strain). The acoustic measures with stronger predictive values were local Shimmer, harmonics-to-noise ratio, APQ5 shimmer, and PPQ5 jitter, with different magnitudes for each one of the studied software programs. Some acoustic measures are pointed as significant predictors of GRBAS parameters, but they differ among software programs. B (Breathiness) was the parameter exhibiting the highest correlation magnitude. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
A Procedure for the Computerized Analysis of Cleft Palate Speech Transcription
ERIC Educational Resources Information Center
Fitzsimons, David A.; Jones, David L.; Barton, Belinda; North, Kathryn N.
2012-01-01
The phonetic symbols used by speech-language pathologists to transcribe speech contain underlying hexadecimal values used by computers to correctly display and process transcription data. This study aimed to develop a procedure to utilise these values as the basis for subsequent computerized analysis of cleft palate speech. A computer keyboard…
Issues in Perceptual Speech Analysis in Cleft Palate and Related Disorders: A Review
ERIC Educational Resources Information Center
Sell, Debbie
2005-01-01
Perceptual speech assessment is central to the evaluation of speech outcomes associated with cleft palate and velopharyngeal dysfunction. However, the complexity of this process is perhaps sometimes underestimated. To draw together the many different strands in the complex process of perceptual speech assessment and analysis, and make…
Imitation and speech: commonalities within Broca's area.
Kühn, Simone; Brass, Marcel; Gallinat, Jürgen
2013-11-01
The so-called embodiment of communication has attracted considerable interest. Recently a growing number of studies have proposed a link between Broca's area's involvement in action processing and its involvement in speech. The present quantitative meta-analysis set out to test whether neuroimaging studies on imitation and overt speech show overlap within inferior frontal gyrus. By means of activation likelihood estimation (ALE), we investigated concurrence of brain regions activated by object-free hand imitation studies as well as overt speech studies including simple syllable and more complex word production. We found direct overlap between imitation and speech in bilateral pars opercularis (BA 44) within Broca's area. Subtraction analyses revealed no unique localization neither for speech nor for imitation. To verify the potential of ALE subtraction analysis to detect unique involvement within Broca's area, we contrasted the results of a meta-analysis on motor inhibition and imitation and found separable regions involved for imitation. This is the first meta-analysis to compare the neural correlates of imitation and overt speech. The results are in line with the proposed evolutionary roots of speech in imitation.
Fisher, Wayne W; Rodriguez, Nicole M; Owen, Todd M
2013-01-01
A functional analysis showed that a 14-year-old boy with Asperger syndrome displayed perseverative speech (or "restricted interests") reinforced by attention. To promote appropriate speech in a turn-taking format, we implemented differential reinforcement (DR) of nonperseverative speech and DR of on-topic speech within a multiple schedule with stimuli that signaled the contingencies in effect and who was to select the topic. Both treatments reduced perseverative speech, but only DR of on-topic speech increased appropriate turn taking during conversation. Treatment effects were maintained when implemented by family members and novel therapists. © Society for the Experimental Analysis of Behavior.
Poole, Matthew L; Brodtmann, Amy; Darby, David; Vogel, Adam P
2017-04-14
Our purpose was to create a comprehensive review of speech impairment in frontotemporal dementia (FTD), primary progressive aphasia (PPA), and progressive apraxia of speech in order to identify the most effective measures for diagnosis and monitoring, and to elucidate associations between speech and neuroimaging. Speech and neuroimaging data described in studies of FTD and PPA were systematically reviewed. A meta-analysis was conducted for speech measures that were used consistently in multiple studies. The methods and nomenclature used to describe speech in these disorders varied between studies. Our meta-analysis identified 3 speech measures which differentiate variants or healthy control-group participants (e.g., nonfluent and logopenic variants of PPA from all other groups, behavioral-variant FTD from a control group). Deficits within the frontal-lobe speech networks are linked to motor speech profiles of the nonfluent variant of PPA and progressive apraxia of speech. Motor speech impairment is rarely reported in semantic and logopenic variants of PPA. Limited data are available on motor speech impairment in the behavioral variant of FTD. Our review identified several measures of speech which may assist with diagnosis and classification, and consolidated the brain-behavior associations relating to speech in FTD, PPA, and progressive apraxia of speech.
Automated speech understanding: the next generation
NASA Astrophysics Data System (ADS)
Picone, J.; Ebel, W. J.; Deshmukh, N.
1995-04-01
Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP). Once a field dominated by vector-oriented processors and linear algebra-based mathematics, the current generation of DSP-based systems rely on sophisticated statistical models implemented using a complex software paradigm. Such systems are now capable of understanding continuous speech input for vocabularies of several thousand words in operational environments. The current generation of deployed systems, based on small vocabularies of isolated words, will soon be replaced by a new technology offering natural language access to vast information resources such as the Internet, and provide completely automated voice interfaces for mundane tasks such as travel planning and directory assistance.
Duenna-An experimental language teaching application
NASA Astrophysics Data System (ADS)
Horváth, Balázs Zsigmond; Blaske, Bence; Szabó, Anita
The presented TTS (text-to-speech) application is an auxiliary tool for language teaching. It utilizes computer-generated voices to simulate dialogs representing different grammatical problems or speech contexts. The software is capable of producing as many examples of dialogs as required to enhance the language learning experience and thus serve curriculum representation, grammar contextualization and pronunciation at the same time. It is designed to be used on a regular basis in the language classroom and students gladly write materials for listening comprehension tasks with it. A pilot study involving 26 students (divided into control and trial groups) practicing for their school-leaving exam, indicates that computer-generated voices are adequate to recreate audio course book materials as well. The voices used were able to involve the students as effectively as if they were listening to recorded human speech.
Multimedia Materials for Language and Literacy Learning.
ERIC Educational Resources Information Center
Hallett, Terry L.
1999-01-01
Introduces educators to inexpensive, commercially-available CD-ROM software that combines speech, text, graphics, sound, video, animation, and special effects that may be incorporated into classroom activities for both normally developing and language learning disabled children. Discusses three types of multimedia CD-ROM products: (1) virtual…
Picoloto, Luana Altran; Cardoso, Ana Cláudia Vieira; Cerqueira, Amanda Venuti; Oliveira, Cristiane Moço Canhetti de
2017-12-07
To verify the effect of delayed auditory feedback on speech fluency of individuals who stutter with and without central auditory processing disorders. The participants were twenty individuals with stuttering from 7 to 17 years old and were divided into two groups: Stuttering Group with Auditory Processing Disorders (SGAPD): 10 individuals with central auditory processing disorders, and Stuttering Group (SG): 10 individuals without central auditory processing disorders. Procedures were: fluency assessment with non-altered auditory feedback (NAF) and delayed auditory feedback (DAF), assessment of the stuttering severity and central auditory processing (CAP). Phono Tools software was used to cause a delay of 100 milliseconds in the auditory feedback. The "Wilcoxon Signal Post" test was used in the intragroup analysis and "Mann-Whitney" test in the intergroup analysis. The DAF caused a statistically significant reduction in SG: in the frequency score of stuttering-like disfluencies in the analysis of the Stuttering Severity Instrument, in the amount of blocks and repetitions of monosyllabic words, and in the frequency of stuttering-like disfluencies of duration. Delayed auditory feedback did not cause statistically significant effects on SGAPD fluency, individuals with stuttering with auditory processing disorders. The effect of delayed auditory feedback in speech fluency of individuals who stutter was different in individuals of both groups, because there was an improvement in fluency only in individuals without auditory processing disorder.
Direct interaction with an assistive robot for individuals with chronic stroke.
Kmetz, Brandon; Markham, Heather; Brewer, Bambi R
2011-01-01
Many robotic systems have been developed to provide assistance to individuals with disabilities. Most of these systems require the individual to interact with the robot via a joystick or keypad, though some utilize techniques such as speech recognition or selection of objects with a laser pointer. In this paper, we describe a prototype system using a novel method of interaction with an assistive robot. A touch-sensitive skin enables the user to directly guide a robotic arm to a desired position. When the skin is released, the robot remains fixed in position. The target population for this system is individuals with hemiparesis due to chronic stroke. The system can be used as a substitute for the paretic arm and hand in bimanual tasks such as holding a jar while removing the lid. This paper describes the hardware and software of the prototype system, which includes a robotic arm, the touch-sensitive skin, a hook-style prehensor, and weight compensation and speech recognition software.
Di Berardino, F; Tognola, G; Paglialonga, A; Alpini, D; Grandori, F; Cesarani, A
2010-08-01
To assess whether different compact disk recording protocols, used to prepare speech test material, affect the reliability and comparability of speech audiometry testing. We conducted acoustic analysis of compact disks used in clinical practice, to determine whether speech material had been recorded using similar procedures. To assess the impact of different recording procedures on speech test outcomes, normal hearing subjects were tested using differently prepared compact disks, and their psychometric curves compared. Acoustic analysis revealed that speech material had been recorded using different protocols. The major difference was the gain between the levels at which the speech material and the calibration signal had been recorded. Although correct calibration of the audiometer was performed for each compact disk before testing, speech recognition thresholds and maximum intelligibility thresholds differed significantly between compact disks (p < 0.05), and were influenced by the gain between the recording level of the speech material and the calibration signal. To ensure the reliability and comparability of speech test outcomes obtained using different compact disks, it is recommended to check for possible differences in the recording gains used to prepare the compact disks, and then to compensate for any differences before testing.
Speech processing using maximum likelihood continuity mapping
Hogden, John E.
2000-01-01
Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Speech processing using maximum likelihood continuity mapping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.E.
Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Vocal Parameters of Elderly Female Choir Singers
Aquino, Fernanda Salvatico de; Ferreira, Léslie Piccolotto
2015-01-01
Introduction Due to increased life expectancy among the population, studying the vocal parameters of the elderly is key to promoting vocal health in old age. Objective This study aims to analyze the profile of the extension of speech of elderly female choristers, according to age group. Method The study counted on the participation of 25 elderly female choristers from the Choir of Messianic Church of São Paulo, with ages varying between 63 and 82 years, and an average of 71 years (standard deviation of 5.22). The elders were divided into two groups: G1 aged 63 to 71 years and G2 aged 72 to 82. We asked that each participant count from 20 to 30 in weak, medium, strong, and very strong intensities. Their speech was registered by the software Vocalgrama that allows the evaluation of the profile of speech range. We then submitted the parameters of frequency and intensity to descriptive analysis, both in minimum and maximum levels, and range of spoken voice. Results The average of minimum and maximum frequencies were respectively 134.82–349.96 Hz for G1 and 137.28–348.59 Hz for G2; the average for minimum and maximum intensities were respectively 40.28–95.50 dB for G1 and 40.63–94.35 dB for G2; the vocal range used in speech was 215.14 Hz for G1 and 211.30 Hz for G2. Conclusion The minimum and maximum frequencies, maximum intensity, and vocal range presented differences in favor of the younger elder group. PMID:26722341
Speed Accuracy Tradeoffs in Human Speech Production
2017-05-01
for considering Fitts’ law in the domain of speech production is elucidated. Methodological challenges in applying Fitts-style analysis are addressed...order to assess whether articulatory kinematics conform to Fitts’ law. A second, associated goal is to address the methodological challenges inherent in...performing Fitts-style analysis on rtMRI data of speech production. Methodological challenges include segmenting continuous speech into specific motor
Law and Order Comes to Cyberspace (and) Filtering the Net.
ERIC Educational Resources Information Center
Diamond, Edwin; And Others
1995-01-01
Examines five legal questions concerning the Internet: constitutional protection in cyberspace from defamatory speech, evasion of laws, accountability for offensive expression, pornography, and digital theft. A sidebar discusses the employment of user-rated Internet sites and software to scan and filter offensive material. (JMV)
Sperry Univac speech communications technology
NASA Technical Reports Server (NTRS)
Medress, Mark F.
1977-01-01
Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.
Program to Diagnose Probability of Aspiration Pneumonia in Patients with Ischemic Stroke
Pinto, Gisele; Zétola, Viviane; Lange, Marcos; Gomes, Guilherme; Nunes, Maria Cristina; Hirata, Gisela; Lagos-Guimarães, Hellen Nataly
2014-01-01
Introduction Stroke is a major cause of death and disability worldwide, with a strong economic and social impact. Approximately 40% of patients show motor, language, and swallowing disorders after stroke. Objective To evaluate the use of software to infer the probability of pneumonia in patients with ischemic stroke. Methods Prospective and cross-sectional study conducted in a university hospital from March 2010 to August 2012. After confirmation of ischemic stroke by computed axial tomography, a clinical and flexible endoscopic evaluation of swallowing was performed within 72 hours of onset of symptoms. All patients received speech therapy poststroke, and the data were subsequently analyzed by the software. The patients were given medical treatment and speech therapy for 3 months. Results The study examined 52 patients with a mean age of 62.05 ± 13.88 years, with 23 (44.2%) women. Of the 52 patients, only 3 (5.7%) had a probability of pneumonia between 80 and 100% as identified by the software. Of all patients, 32 (61.7%) had pneumonia probability between 0 and 19%, 5 (9.5%) between 20 and 49%, 3 (5.8%) between 50 and 79%, and 12 (23.0%) between 80 and 100%. Conclusion The computer program indicates the probability of patient having aspiration pneumonia after ischemic stroke. PMID:25992100
LINCOLN, MICHELLE; HINES, MONIQUE; FAIRWEATHER, CRAIG; RAMSDEN, ROBYN; MARTINOVICH, JULIA
2015-01-01
The objective of this study was to investigate stakeholders’ views on the feasibility and acceptability of a pilot speech pathology teletherapy program for children attending schools in rural New South Wales, Australia. Nine children received speech pathology sessions delivered via Adobe Connect® web-conferencing software. During semi-structured interviews, school principals (n = 3), therapy facilitators (n = 7), and parents (n = 6) described factors that promoted or threatened the program’s feasibility and acceptability. Themes were categorized according to whether they related to (a) the use of technology; (b) the school-based nature of the program; or (c) the combination of using technology with a school-based program. Despite frequent reports of difficulties with technology, teletherapy delivery of speech pathology services in schools was highly acceptable to stakeholders. However, the use of technology within a school environment increased the complexities of service delivery. Service providers should pay careful attention to planning processes and lines of communication in order to promote efficiency and acceptability of teletherapy programs. PMID:25945230
Hetzroni, Orit E; Tannous, Juman
2004-04-01
This study investigated the use of computer-based intervention for enhancing communication functions of children with autism. The software program was developed based on daily life activities in the areas of play, food, and hygiene. The following variables were investigated: delayed echolalia, immediate echolalia, irrelevant speech, relevant speech, and communicative initiations. Multiple-baseline design across settings was used to examine the effects of the exposure of five children with autism to activities in a structured and controlled simulated environment on the communication manifested in their natural environment. Results indicated that after exposure to the simulations, all children produced fewer sentences with delayed and irrelevant speech. Most of the children engaged in fewer sentences involving immediate echolalia and increased the number of communication intentions and the amount of relevant speech they produced. Results indicated that after practicing in a controlled and structured setting that provided the children with opportunities to interact in play, food, and hygiene activities, the children were able to transfer their knowledge to the natural classroom environment. Implications and future research directions are discussed.
Discourse Analysis and Language Learning [Summary of a Symposium].
ERIC Educational Resources Information Center
Hatch, Evelyn
1981-01-01
A symposium on discourse analysis and language learning is summarized. Discourse analysis can be divided into six fields of research: syntax, the amount of syntactic organization required for different types of discourse, large speech events, intra-sentential cohesion in text, speech acts, and unequal power discourse. Research on speech events and…
Automated analysis of free speech predicts psychosis onset in high-risk youths
Bedi, Gillinder; Carrillo, Facundo; Cecchi, Guillermo A; Slezak, Diego Fernández; Sigman, Mariano; Mota, Natália B; Ribeiro, Sidarta; Javitt, Daniel C; Copelli, Mauro; Corcoran, Cheryl M
2015-01-01
Background/Objectives: Psychiatry lacks the objective clinical tests routinely used in other specializations. Novel computerized methods to characterize complex behaviors such as speech could be used to identify and predict psychiatric illness in individuals. AIMS: In this proof-of-principle study, our aim was to test automated speech analyses combined with Machine Learning to predict later psychosis onset in youths at clinical high-risk (CHR) for psychosis. Methods: Thirty-four CHR youths (11 females) had baseline interviews and were assessed quarterly for up to 2.5 years; five transitioned to psychosis. Using automated analysis, transcripts of interviews were evaluated for semantic and syntactic features predicting later psychosis onset. Speech features were fed into a convex hull classification algorithm with leave-one-subject-out cross-validation to assess their predictive value for psychosis outcome. The canonical correlation between the speech features and prodromal symptom ratings was computed. Results: Derived speech features included a Latent Semantic Analysis measure of semantic coherence and two syntactic markers of speech complexity: maximum phrase length and use of determiners (e.g., which). These speech features predicted later psychosis development with 100% accuracy, outperforming classification from clinical interviews. Speech features were significantly correlated with prodromal symptoms. Conclusions: Findings support the utility of automated speech analysis to measure subtle, clinically relevant mental state changes in emergent psychosis. Recent developments in computer science, including natural language processing, could provide the foundation for future development of objective clinical tests for psychiatry. PMID:27336038
Using speech recognition to enhance the Tongue Drive System functionality in computer access.
Huo, Xueliang; Ghovanloo, Maysam
2011-01-01
Tongue Drive System (TDS) is a wireless tongue operated assistive technology (AT), which can enable people with severe physical disabilities to access computers and drive powered wheelchairs using their volitional tongue movements. TDS offers six discrete commands, simultaneously available to the users, for pointing and typing as a substitute for mouse and keyboard in computer access, respectively. To enhance the TDS performance in typing, we have added a microphone, an audio codec, and a wireless audio link to its readily available 3-axial magnetic sensor array, and combined it with a commercially available speech recognition software, the Dragon Naturally Speaking, which is regarded as one of the most efficient ways for text entry. Our preliminary evaluations indicate that the combined TDS and speech recognition technologies can provide end users with significantly higher performance than using each technology alone, particularly in completing tasks that require both pointing and text entry, such as web surfing.
NASA Astrophysics Data System (ADS)
Lightstone, P. C.; Davidson, W. M.
1982-04-01
The military detection assessment laboratory houses an experimental field system which assesses different alarm indicators such as fence disturbance sensors, MILES cables, and microwave Racons. A speech synthesis board which could be interfaced, by means of a computer, to an alarm logger making verbal acknowledgement of alarms possible was purchased. Different products and different types of voice synthesis were analyzed before a linear predictive code device produced by Telesensory Speech Systems of Palo Alto, California was chosen. This device is called the Speech 1000 Board and has a dedicated 8085 processor. A multiplexer card was designed and the Sp 1000 interfaced through the card into a TMS 990/100M Texas Instrument microcomputer. It was also necessary to design the software with the capability of recognizing and flagging an alarm on any 1 of 32 possible lines. The experimental field system was then packaged with a dc power supply, LED indicators, speakers, and switches, and deployed in the field performing reliably.
ERIC Educational Resources Information Center
Stansfield, Jois; Armstrong, Linda
2016-01-01
Background: Following a content analysis of the first 10 years of the UK professional journal "Speech," this study was conducted to survey the published work of the speech (and language) therapy profession in the 20 years following the unification of two separate professional bodies into the College of Speech Therapists. Aim: To…
Hengst, Julie A; Frame, Simone R; Neuman-Stritzel, Tiffany; Gannaway, Rachel
2005-02-01
Reported speech, wherein one quotes or paraphrases the speech of another, has been studied extensively as a set of linguistic and discourse practices. Researchers agree that reported speech is pervasive, found across languages, and used in diverse contexts. However, to date, there have been no studies of the use of reported speech among individuals with aphasia. Grounded in an interactional sociolinguistic perspective, the study presented here documents and analyzes the use of reported speech by 7 adults with mild to moderately severe aphasia and their routine communication partners. Each of the 7 pairs was videotaped in 4 everyday activities at home or around the community, yielding over 27 hr of conversational interaction for analysis. A coding scheme was developed that identified 5 types of explicitly marked reported speech: direct, indirect, projected, indexed, and undecided. Analysis of the data documented reported speech as a common discourse practice used successfully by the individuals with aphasia and their communication partners. All participants produced reported speech at least once, and across all observations the target pairs produced 400 reported speech episodes (RSEs), 149 by individuals with aphasia and 251 by their communication partners. For all participants, direct and indirect forms were the most prevalent (70% of RSEs). Situated discourse analysis of specific episodes of reported speech used by 3 of the pairs provides detailed portraits of the diverse interactional, referential, social, and discourse functions of reported speech and explores ways that the pairs used reported speech to successfully frame talk despite their ongoing management of aphasia.
Biological Impact of Music and Software-Based Auditory Training
ERIC Educational Resources Information Center
Kraus, Nina
2012-01-01
Auditory-based communication skills are developed at a young age and are maintained throughout our lives. However, some individuals--both young and old--encounter difficulties in achieving or maintaining communication proficiency. Biological signals arising from hearing sounds relate to real-life communication skills such as listening to speech in…
Scenario-Based Spoken Interaction with Virtual Agents
ERIC Educational Resources Information Center
Morton, Hazel; Jack, Mervyn A.
2005-01-01
This paper describes a CALL approach which integrates software for speaker independent continuous speech recognition with embodied virtual agents and virtual worlds to create an immersive environment in which learners can converse in the target language in contextualised scenarios. The result is a self-access learning package: SPELL (Spoken…
Supporting Struggling Readers in Secondary School Science Classes
ERIC Educational Resources Information Center
Roberts, Kelly D.; Takahashi, Kiriko; Park, Hye-Jin; Stodden, Robert A.
2012-01-01
Many secondary school students struggle to read complex expository text such as science textbooks. This article provides step-by-step guidance on how to foster expository reading for struggling readers in secondary school science classes. Two strategies are introduced: Text-to-Speech (TTS) Software as a reading compensatory strategy and the…
Advances in EPG for Treatment and Research: An Illustrative Case Study
ERIC Educational Resources Information Center
Scobbie, James M.; Wood, Sara E.; Wrench, Alan A.
2004-01-01
Electropalatography (EPG), a technique which reveals tongue-palate contact patterns over time, is a highly effective tool for speech research. We report here on recent developments by Articulate Instruments Ltd. These include hardware for Windows-based computers, backwardly compatible (with Reading EPG3) software systems for clinical intervention…
Scientific bases of human-machine communication by voice.
Schafer, R W
1995-01-01
The scientific bases for human-machine communication by voice are in the fields of psychology, linguistics, acoustics, signal processing, computer science, and integrated circuit technology. The purpose of this paper is to highlight the basic scientific and technological issues in human-machine communication by voice and to point out areas of future research opportunity. The discussion is organized around the following major issues in implementing human-machine voice communication systems: (i) hardware/software implementation of the system, (ii) speech synthesis for voice output, (iii) speech recognition and understanding for voice input, and (iv) usability factors related to how humans interact with machines. PMID:7479802
Qi, Beier; Liu, Bo; Liu, Sha; Liu, Haihong; Dong, Ruijuan; Zhang, Ning; Gong, Shusheng
2011-05-01
To study the effect of cochlear electrode coverage and different insertion region on speech recognition, especially tone perception of cochlear implant users whose native language is Mandarin Chinese. Setting seven test conditions by fitting software. All conditions were created by switching on/off respective channels in order to simulate different insertion position. Then Mandarin CI users received 4 Speech tests, including Vowel Identification test, Consonant Identification test, Tone Identification test-male speaker, Mandarin HINT test (SRS) in quiet and noise. To all test conditions: the average score of vowel identification was significantly different, from 56% to 91% (Rank sum test, P < 0.05). The average score of consonant identification was significantly different, from 72% to 85% (ANOVNA, P < 0.05). The average score of Tone identification was not significantly different (ANOVNA, P > 0.05). However the more channels activated, the higher scores obtained, from 68% to 81%. This study shows that there is a correlation between insertion depth and speech recognition. Because all parts of the basement membrane can help CI users to improve their speech recognition ability, it is very important to enhance verbal communication ability and social interaction ability of CI users by increasing insertion depth and actively stimulating the top region of cochlear.
An Analysis of The Parameters Used In Speech ABR Assessment Protocols.
Sanfins, Milaine D; Hatzopoulos, Stavros; Donadon, Caroline; Diniz, Thais A; Borges, Leticia R; Skarzynski, Piotr H; Colella-Santos, Maria Francisca
2018-04-01
The aim of this study was to assess the parameters of choice, such as duration, intensity, rate, polarity, number of sweeps, window length, stimulated ear, fundamental frequency, first formant, and second formant, from previously published speech ABR studies. To identify candidate articles, five databases were assessed using the following keyword descriptors: speech ABR, ABR-speech, speech auditory brainstem response, auditory evoked potential to speech, speech-evoked brainstem response, and complex sounds. The search identified 1288 articles published between 2005 and 2015. After filtering the total number of papers according to the inclusion and exclusion criteria, 21 studies were selected. Analyzing the protocol details used in 21 studies suggested that there is no consensus to date on a speech-ABR protocol and that the parameters of analysis used are quite variable between studies. This inhibits the wider generalization and extrapolation of data across languages and studies.
Perceptual analysis of speech following traumatic brain injury in childhood.
Cahill, Louise M; Murdoch, Bruce E; Theodoros, Deborah G
2002-05-01
To investigate perceptually the speech dimensions, oromotor function, and speech intelligibility of a group of individuals with traumatic brain injury (TBI) acquired in childhood. The speech of 24 children with TBI was analysed perceptually and compared with that of a group of non-neurologically impaired children matched for age and sex. The 16 dysarthric TBI subjects were significantly less intelligible than the control subjects, and demonstrated significant impairment in 12 of the 33 speech dimensions rated. In addition, the eight non-dysarthric TBI subjects were significantly impaired in many areas of oromotor function on the Frenchay Dysarthria Assessment, indicating some degree of pre-clinical speech impairment. The results of the perceptual analysis are discussed in terms of the possible underlying pathophysiological bases of the deviant speech features identified, and the need for a comprehensive instrumental assessment, to more accurately determine the level of breakdown in the speech production mechanism in children following TBI.
Three Factors Are Critical in Order to Synthesize Intelligible Noise-Vocoded Japanese Speech
Kishida, Takuya; Nakajima, Yoshitaka; Ueda, Kazuo; Remijn, Gerard B.
2016-01-01
Factor analysis (principal component analysis followed by varimax rotation) had shown that 3 common factors appear across 20 critical-band power fluctuations derived from spoken sentences of eight different languages [Ueda et al. (2010). Fechner Day 2010, Padua]. The present study investigated the contributions of such power-fluctuation factors to speech intelligibility. The method of factor analysis was modified to obtain factors suitable for resynthesizing speech sounds as 20-critical-band noise-vocoded speech. The resynthesized speech sounds were used for an intelligibility test. The modification of factor analysis ensured that the resynthesized speech sounds were not accompanied by a steady background noise caused by the data reduction procedure. Spoken sentences of British English, Japanese, and Mandarin Chinese were subjected to this modified analysis. Confirming the earlier analysis, indeed 3–4 factors were common to these languages. The number of power-fluctuation factors needed to make noise-vocoded speech intelligible was then examined. Critical-band power fluctuations of the Japanese spoken sentences were resynthesized from the obtained factors, resulting in noise-vocoded-speech stimuli, and the intelligibility of these speech stimuli was tested by 12 native Japanese speakers. Japanese mora (syllable-like phonological unit) identification performances were measured when the number of factors was 1–9. Statistically significant improvement in intelligibility was observed when the number of factors was increased stepwise up to 6. The 12 listeners identified 92.1% of the morae correctly on average in the 6-factor condition. The intelligibility improved sharply when the number of factors changed from 2 to 3. In this step, the cumulative contribution ratio of factors improved only by 10.6%, from 37.3 to 47.9%, but the average mora identification leaped from 6.9 to 69.2%. The results indicated that, if the number of factors is 3 or more, elementary linguistic information is preserved in such noise-vocoded speech. PMID:27199790
Trends in computer hardware and software.
Frankenfeld, F M
1993-04-01
Previously identified and current trends in the development of computer systems and in the use of computers for health care applications are reviewed. Trends identified in a 1982 article were increasing miniaturization and archival ability, increasing software costs, increasing software independence, user empowerment through new software technologies, shorter computer-system life cycles, and more rapid development and support of pharmaceutical services. Most of these trends continue today. Current trends in hardware and software include the increasing use of reduced instruction-set computing, migration to the UNIX operating system, the development of large software libraries, microprocessor-based smart terminals that allow remote validation of data, speech synthesis and recognition, application generators, fourth-generation languages, computer-aided software engineering, object-oriented technologies, and artificial intelligence. Current trends specific to pharmacy and hospitals are the withdrawal of vendors of hospital information systems from the pharmacy market, improved linkage of information systems within hospitals, and increased regulation by government. The computer industry and its products continue to undergo dynamic change. Software development continues to lag behind hardware, and its high cost is offsetting the savings provided by hardware.
Paats, A; Alumäe, T; Meister, E; Fridolin, I
2018-04-30
The aim of this study was to analyze retrospectively the influence of different acoustic and language models in order to determine the most important effects to the clinical performance of an Estonian language-based non-commercial radiology-oriented automatic speech recognition (ASR) system. An ASR system was developed for Estonian language in radiology domain by utilizing open-source software components (Kaldi toolkit, Thrax). The ASR system was trained with the real radiology text reports and dictations collected during development phases. The final version of the ASR system was tested by 11 radiologists who dictated 219 reports in total, in spontaneous manner in a real clinical environment. The audio files collected in the final phase were used to measure the performance of different versions of the ASR system retrospectively. ASR system versions were evaluated by word error rate (WER) for each speaker and modality and by WER difference for the first and the last version of the ASR system. Total average WER for the final version throughout all material was improved from 18.4% of the first version (v1) to 5.8% of the last (v8) version which corresponds to relative improvement of 68.5%. WER improvement was strongly related to modality and radiologist. In summary, the performance of the final ASR system version was close to optimal, delivering similar results to all modalities and being independent on user, the complexity of the radiology reports, user experience, and speech characteristics.
Analysis of False Starts in Spontaneous Speech.
ERIC Educational Resources Information Center
O'Shaughnessy, Douglas
A primary difference between spontaneous speech and read speech concerns the use of false starts, where a speaker interrupts the flow of speech to restart his or her utterance. A study examined the acoustic aspects of such restarts in a widely-used speech database, examining approximately 1000 utterances, about 10% of which contained a restart.…
Stansfield, Jois; Armstrong, Linda
2016-07-01
Following a content analysis of the first 10 years of the UK professional journal Speech, this study was conducted to survey the published work of the speech (and language) therapy profession in the 20 years following the unification of two separate professional bodies into the College of Speech Therapists. To understand better the development of the speech (and language) therapy profession in the UK in order to support the development of an online history of the speech and language therapy profession in the UK. The 40 issues of the professional journal of the College of Speech Therapists published between 1946 and 1965 (Speech and later Speech Pathology and Therapy) were examined using content analysis and the content compared with that of the same journal as it appeared from 1935 to the end of the Second World War (1945). Many aspects of the journal and its authored papers were retained from the earlier years, for example, the range of authors' professions, their location mainly in the UK, their number of contributions and the length of papers. Changes and developments included the balance of original to republished papers, the description and discussion of new professional issues, and an extended range of client groups/disorders. The journal and its articles reflect the growing maturity of the newly unified profession of speech therapy and give an indication both of the expanding depth of knowledge available to speech therapists and of the rapidly increasing breadth of their work over this period. © 2016 Royal College of Speech and Language Therapists.
Contributions of speech science to the technology of man-machine voice interactions
NASA Technical Reports Server (NTRS)
Lea, Wayne A.
1977-01-01
Research in speech understanding was reviewed. Plans which include prosodics research, phonological rules for speech understanding systems, and continued interdisciplinary phonetics research are discussed. Improved acoustic phonetic analysis capabilities in speech recognizers are suggested.
Rethinking Protocol Analysis from a Cultural Perspective.
ERIC Educational Resources Information Center
Smagorinsky, Peter
2001-01-01
Outlines a cultural-historical activity theory (CHAT) perspective that accounts for protocol analysis along three key dimensions: the relationship between thinking and speech from a representational standpoint; the social role of speech in research methodology; and the influence of speech on thinking and data collection. (Author/VWL)
Cleft audit protocol for speech (CAPS-A): a comprehensive training package for speech analysis.
Sell, D; John, A; Harding-Bell, A; Sweeney, T; Hegarty, F; Freeman, J
2009-01-01
The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been paid to this issue. To design, execute, and evaluate a training programme for speech and language therapists on the systematic and reliable use of the Cleft Audit Protocol for Speech-Augmented (CAPS-A), addressing issues of standardized speech samples, data acquisition, recording, playback, and listening guidelines. Thirty-six specialist speech and language therapists undertook the training programme over four days. This consisted of two days' training on the CAPS-A tool followed by a third day, making independent ratings and transcriptions on ten new cases which had been previously recorded during routine audit data collection. This task was repeated on day 4, a minimum of one month later. Ratings were made using the CAPS-A record form with the CAPS-A definition table. An analysis was made of the speech and language therapists' CAPS-A ratings at occasion 1 and occasion 2 and the intra- and inter-rater reliability calculated. Trained therapists showed consistency in individual judgements on specific sections of the tool. Intraclass correlation coefficients were calculated for each section with good agreement on eight of 13 sections. There were only fair levels of agreement on anterior oral cleft speech characteristics, non-cleft errors/immaturities and voice. This was explained, at least in part, by their low prevalence which affects the calculation of the intraclass correlation coefficient statistic. Speech and language therapists benefited from training on the CAPS-A, focusing on specific aspects of speech using definitions of parameters and scalar points, in order to apply the tool systematically and reliably. Ratings are enhanced by ensuring a high degree of attention to the nature of the data, standardizing the speech sample, data acquisition, the listening process together with the use of high-quality recording and playback equipment. In addition, a method is proposed for maintaining listening skills following training as part of an individual's continuing education.
Alpermann, Anke; Huber, Walter; Natke, Ulrich; Willmes, Klaus
2010-09-01
Improved fluency after stuttering therapy is usually measured by the percentage of stuttered syllables. However, outcome studies rarely evaluate the use of trained speech patterns that speakers use to manage stuttering. This study investigated whether the modified time interval analysis can distinguish between trained speech patterns, fluent speech, and stuttered speech. Seventeen German experts on stuttering judged a speech sample on two occasions. Speakers of the sample were stuttering adults, who were not undergoing therapy, as well as participants in a fluency shaping and a stuttering modification therapy. Results showed satisfactory inter-judge and intra-judge agreement above 80%. Intervals with trained speech patterns were identified as consistently as stuttered and fluent intervals. We discuss limitations of the study, as well as implications of our findings for the development of training for identification of trained speech patterns and future outcome studies. The reader will be able to (a) explain different methods to measure the use of trained speech patterns, (b) evaluate whether German experts are able to discriminate intervals with trained speech patterns reliably from fluent and stuttered intervals and (c) describe how the measurement of trained speech patterns can contribute to outcome studies.
McCreery, Ryan W.; Alexander, Joshua; Brennan, Marc A.; Hoover, Brenda; Kopun, Judy; Stelmachowicz, Patricia G.
2014-01-01
Objective The primary goal of nonlinear frequency compression (NFC) and other frequency lowering strategies is to increase the audibility of high-frequency sounds that are not otherwise audible with conventional hearing-aid processing due to the degree of hearing loss, limited hearing aid bandwidth or a combination of both factors. The aim of the current study was to compare estimates of speech audibility processed by NFC to improvements in speech recognition for a group of children and adults with high-frequency hearing loss. Design Monosyllabic word recognition was measured in noise for twenty-four adults and twelve children with mild to severe sensorineural hearing loss. Stimuli were amplified based on each listener’s audiogram with conventional processing (CP) with amplitude compression or with NFC and presented under headphones using a software-based hearing aid simulator. A modification of the speech intelligibility index (SII) was used to estimate audibility of information in frequency-lowered bands. The mean improvement in SII was compared to the mean improvement in speech recognition. Results All but two listeners experienced improvements in speech recognition with NFC compared to CP, consistent with the small increase in audibility that was estimated using the modification of the SII. Children and adults had similar improvements in speech recognition with NFC. Conclusion Word recognition with NFC was higher than CP for children and adults with mild to severe hearing loss. The average improvement in speech recognition with NFC (7%) was consistent with the modified SII, which indicated that listeners experienced an increase in audibility with NFC compared to CP. Further studies are necessary to determine if changes in audibility with NFC are related to speech recognition with NFC for listeners with greater degrees of hearing loss, with a greater variety of compression settings, and using auditory training. PMID:24535558
A common functional neural network for overt production of speech and gesture.
Marstaller, L; Burianová, H
2015-01-22
The perception of co-speech gestures, i.e., hand movements that co-occur with speech, has been investigated by several studies. The results show that the perception of co-speech gestures engages a core set of frontal, temporal, and parietal areas. However, no study has yet investigated the neural processes underlying the production of co-speech gestures. Specifically, it remains an open question whether Broca's area is central to the coordination of speech and gestures as has been suggested previously. The objective of this study was to use functional magnetic resonance imaging to (i) investigate the regional activations underlying overt production of speech, gestures, and co-speech gestures, and (ii) examine functional connectivity with Broca's area. We hypothesized that co-speech gesture production would activate frontal, temporal, and parietal regions that are similar to areas previously found during co-speech gesture perception and that both speech and gesture as well as co-speech gesture production would engage a neural network connected to Broca's area. Whole-brain analysis confirmed our hypothesis and showed that co-speech gesturing did engage brain areas that form part of networks known to subserve language and gesture. Functional connectivity analysis further revealed a functional network connected to Broca's area that is common to speech, gesture, and co-speech gesture production. This network consists of brain areas that play essential roles in motor control, suggesting that the coordination of speech and gesture is mediated by a shared motor control network. Our findings thus lend support to the idea that speech can influence co-speech gesture production on a motoric level. Copyright © 2014 IBRO. Published by Elsevier Ltd. All rights reserved.
Digital Data Collection and Analysis: Application for Clinical Practice
ERIC Educational Resources Information Center
Ingram, Kelly; Bunta, Ferenc; Ingram, David
2004-01-01
Technology for digital speech recording and speech analysis is now readily available for all clinicians who use a computer. This article discusses some advantages of moving from analog to digital recordings and outlines basic recording procedures. The purpose of this article is to familiarize speech-language pathologists with computerized audio…
Using Commercial-Off-The-Shelf Speech Recognition Software for Conning U.S. Warships
2003-06-01
Linear Regression , 2nd Edition, (John Wiley & Sons, St. Paul, Minnesota, 1985), pp. 267-269. 44 Current Projects About the Sigmoid Curve, Sigmoid Curve...Disabilities Conference, Conference Proceedings, [www.csun.edu/cod/conf/1998/proceedings/csun98_052.htm], as of June 2, 2003. 43 Weisberg, S., Applied
Communication for Scientists and Engineers: A "Computer Model" in the Basic Course.
ERIC Educational Resources Information Center
Haynes, W. Lance
Successful speech should rest not on prepared notes and outlines but on genuine oral discourse based on "data" fed into the "software" in the computer which already exists within each person. Writing cannot speak for itself, nor can it continually adjust itself to accommodate diverse response. Moreover, no matter how skillfully…
ERIC Educational Resources Information Center
Hetzroni, Orit E.; Tannous, Juman
2004-01-01
This study investigated the use of computer-based intervention for enhancing communication functions of children with autism. The software program was developed based on daily life activities in the areas of play, food, and hygiene. The following variables were investigated: delayed echolalia, immediate echolalia, irrelevant speech, relevant…
Multi-microphone adaptive array augmented with visual cueing.
Gibson, Paul L; Hedin, Dan S; Davies-Venn, Evelyn E; Nelson, Peggy; Kramer, Kevin
2012-01-01
We present the development of an audiovisual array that enables hearing aid users to converse with multiple speakers in reverberant environments with significant speech babble noise where their hearing aids do not function well. The system concept consists of a smartphone, a smartphone accessory, and a smartphone software application. The smartphone accessory concept is a multi-microphone audiovisual array in a form factor that allows attachment to the back of the smartphone. The accessory will also contain a lower power radio by which it can transmit audio signals to compatible hearing aids. The smartphone software application concept will use the smartphone's built in camera to acquire images and perform real-time face detection using the built-in face detection support of the smartphone. The audiovisual beamforming algorithm uses the location of talking targets to improve the signal to noise ratio and consequently improve the user's speech intelligibility. Since the proposed array system leverages a handheld consumer electronic device, it will be portable and low cost. A PC based experimental system was developed to demonstrate the feasibility of an audiovisual multi-microphone array and these results are presented.
Music and Hearing Aids—An Introduction
2012-01-01
Modern digital hearing aids have provided improved fidelity over those of earlier decades for speech. The same however cannot be said for music. Most modern hearing aids have a limitation of their “front end,” which comprises the analog-to-digital (A/D) converter. For a number of reasons, the spectral nature of music as an input to a hearing aid is beyond the optimal operating conditions of the “front end” components. Amplified music tends to be of rather poor fidelity. Once the music signal is distorted, no amount of software manipulation that occurs later in the circuitry can improve things. The solution is not a software issue. Some characteristics of music that make it difficult to be transduced without significant distortion include an increased sound level relative to that of speech, and the crest factor- the difference in dB between the instantaneous peak of a signal and its RMS value. Clinical strategies and technical innovations have helped to improve the fidelity of amplified music and these include a reduction of the level of the input that is presented to the A/D converter. PMID:23258616
Music and hearing aids--an introduction.
Chasin, Marshall
2012-09-01
Modern digital hearing aids have provided improved fidelity over those of earlier decades for speech. The same however cannot be said for music. Most modern hearing aids have a limitation of their "front end," which comprises the analog-to-digital (A/D) converter. For a number of reasons, the spectral nature of music as an input to a hearing aid is beyond the optimal operating conditions of the "front end" components. Amplified music tends to be of rather poor fidelity. Once the music signal is distorted, no amount of software manipulation that occurs later in the circuitry can improve things. The solution is not a software issue. Some characteristics of music that make it difficult to be transduced without significant distortion include an increased sound level relative to that of speech, and the crest factor- the difference in dB between the instantaneous peak of a signal and its RMS value. Clinical strategies and technical innovations have helped to improve the fidelity of amplified music and these include a reduction of the level of the input that is presented to the A/D converter.
Nakai, Yasushi; Takiguchi, Tetsuya; Matsui, Gakuyo; Yamaoka, Noriko; Takada, Satoshi
2017-10-01
Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.
Speech and gait in Parkinson's disease: When rhythm matters.
Ricciardi, Lucia; Ebreo, Michela; Graziosi, Adriana; Barbuto, Marianna; Sorbera, Chiara; Morgante, Letterio; Morgante, Francesca
2016-11-01
Speech disturbances in Parkinson's disease (PD) are heterogeneous, ranging from hypokinetic to hyperkinetic types. Repetitive speech disorder has been demonstrated in more advanced disease stages and has been considered the speech equivalent of freezing of gait (FOG). We aimed to verify a possible relationship between speech and FOG in patients with PD. Forty-three consecutive PD patients and 20 healthy control subjects underwent standardized speech evaluation using the Italian version of the Dysarthria Profile (DP), for its motor component, and subsets of the Battery for the Analysis of the Aphasic Deficit (BADA), for its procedural component. DP is a scale composed of 7 sub-sections assessing different features of speech; the rate/prosody section of DP includes items investigating the presence of repetitive speech disorder. Severity of FOG was evaluated with the new freezing of gait questionnaire (NFGQ). PD patients performed worse at DP and BADA compared to healthy controls; patients with FOG or with Hoehn-Yahr >2 reported lower scores in the articulation, intellibility, rate/prosody sections of DP and in the semantic verbal fluency test. Logistic regression analysis showed that only age and rate/prosody scores were significantly associated to FOG in PD. Multiple regression analysis showed that only the severity of FOG was associated to rate/prosody score. Our data demonstrate that repetitive speech disorder is related to FOG and is associated to advanced disease stages and independent of disease duration. Speech dysfluency represents a disorder of motor speech control, possibly sharing pathophysiological mechanisms with FOG. Copyright © 2016 Elsevier Ltd. All rights reserved.
The somatotopy of speech: Phonation and articulation in the human motor cortex
Brown, Steven; Laird, Angela R.; Pfordresher, Peter Q.; Thelen, Sarah M.; Turkeltaub, Peter; Liotti, Mario
2010-01-01
A sizable literature on the neuroimaging of speech production has reliably shown activations in the orofacial region of the primary motor cortex. These activations have invariably been interpreted as reflecting “mouth” functioning and thus articulation. We used functional magnetic resonance imaging to compare an overt speech task with tongue movement, lip movement, and vowel phonation. The results showed that the strongest motor activation for speech was the somatotopic larynx area of the motor cortex, thus reflecting the significant contribution of phonation to speech production. In order to analyze further the phonatory component of speech, we performed a voxel-based meta-analysis of neuroimaging studies of syllable-singing (11 studies) and compared the results with a previously-published meta-analysis of oral reading (11 studies), showing again a strong overlap in the larynx motor area. Overall, these findings highlight the under-recognized presence of phonation in imaging studies of speech production, and support the role of the larynx motor cortex in mediating the “melodicity” of speech. PMID:19162389
Huh, Young Eun; Park, Jongkyu; Suh, Mee Kyung; Lee, Sang Eun; Kim, Jumin; Jeong, Yuri; Kim, Hee-Tae; Cho, Jin Whan
2015-08-01
In Parkinson variant of multiple system atrophy (MSA-P), patterns of early speech impairment and their distinguishing features from Parkinson's disease (PD) require further exploration. Here, we compared speech data among patients with early-stage MSA-P, PD, and healthy subjects using quantitative acoustic and perceptual analyses. Variables were analyzed for men and women in view of gender-specific features of speech. Acoustic analysis revealed that male patients with MSA-P exhibited more profound speech abnormalities than those with PD, regarding increased voice pitch, prolonged pause time, and reduced speech rate. This might be due to widespread pathology of MSA-P in nigrostriatal or extra-striatal structures related to speech production. Although several perceptual measures were mildly impaired in MSA-P and PD patients, none of these parameters showed a significant difference between patient groups. Detailed speech analysis using acoustic measures may help distinguish between MSA-P and PD early in the disease process. Copyright © 2015 Elsevier Inc. All rights reserved.
De Smet, Hyo Jung; Catsman-Berrevoets, Coriene; Aarsen, Femke; Verhoeven, Jo; Mariën, Peter; Paquier, Philippe F
2012-09-01
Mutism and Subsequent Dysarthria (MSD) and the Posterior Fossa Syndrome (PFS) have become well-recognized clinical entities which may develop after resection of cerebellar tumours. However, speech characteristics following a period of mutism have not been documented in much detail. This study carried out a perceptual speech analysis in 24 children and adolescents (of whom 12 became mute in the immediate postoperative phase) 1-12.2 years after cerebellar tumour resection. The most prominent speech deficits in this study were distorted vowels, slow rate, voice tremor, and monopitch. Factors influencing long-term speech disturbances are presence or absence of postoperative PFS, the localisation of the surgical lesion and the type of adjuvant treatment. Long-term speech deficits may be present up to 12 years post-surgery. The speech deficits found in children and adolescents with cerebellar lesions following cerebellar tumour surgery do not necessarily resemble adult speech characteristics of ataxic dysarthria. Copyright © 2012 European Paediatric Neurology Society. Published by Elsevier Ltd. All rights reserved.
Meeuws, Matthias; Pascoal, David; Bermejo, Iñigo; Artaso, Miguel; De Ceulaer, Geert; Govaerts, Paul J
2017-07-01
The software application FOX ('Fitting to Outcome eXpert') is an intelligent agent to assist in the programing of cochlear implant (CI) processors. The current version utilizes a mixture of deterministic and probabilistic logic which is able to improve over time through a learning effect. This study aimed at assessing whether this learning capacity yields measurable improvements in speech understanding. A retrospective study was performed on 25 consecutive CI recipients with a median CI use experience of 10 years who came for their annual CI follow-up fitting session. All subjects were assessed by means of speech audiometry with open set monosyllables at 40, 55, 70, and 85 dB SPL in quiet with their home MAP. Other psychoacoustic tests were executed depending on the audiologist's clinical judgment. The home MAP and the corresponding test results were entered into FOX. If FOX suggested to make MAP changes, they were implemented and another speech audiometry was performed with the new MAP. FOX suggested MAP changes in 21 subjects (84%). The within-subject comparison showed a significant median improvement of 10, 3, 1, and 7% at 40, 55, 70, and 85 dB SPL, respectively. All but two subjects showed an instantaneous improvement in their mean speech audiometric score. Persons with long-term CI use, who received a FOX-assisted CI fitting at least 6 months ago, display improved speech understanding after MAP modifications, as recommended by the current version of FOX. This can be explained only by intrinsic improvements in FOX's algorithms, as they have resulted from learning. This learning is an inherent feature of artificial intelligence and it may yield measurable benefit in speech understanding even in long-term CI recipients.
Yunusova, Yana; Graham, Naida L.; Shellikeri, Sanjana; Phuong, Kent; Kulkarni, Madhura; Rochon, Elizabeth; Tang-Wai, David F.; Chow, Tiffany W.; Black, Sandra E.; Zinman, Lorne H.; Green, Jordan R.
2016-01-01
Objective This study examines reading aloud in patients with amyotrophic lateral sclerosis (ALS) and those with frontotemporal dementia (FTD) in order to determine whether differences in patterns of speaking and pausing exist between patients with primary motor vs. primary cognitive-linguistic deficits, and in contrast to healthy controls. Design 136 participants were included in the study: 33 controls, 85 patients with ALS, and 18 patients with either the behavioural variant of FTD (FTD-BV) or progressive nonfluent aphasia (FTD-PNFA). Participants with ALS were further divided into 4 non-overlapping subgroups—mild, respiratory, bulbar (with oral-motor deficit) and bulbar-respiratory—based on the presence and severity of motor bulbar or respiratory signs. All participants read a passage aloud. Custom-made software was used to perform speech and pause analyses, and this provided measures of speaking and articulatory rates, duration of speech, and number and duration of pauses. These measures were statistically compared in different subgroups of patients. Results The results revealed clear differences between patient groups and healthy controls on the passage reading task. A speech-based motor function measure (i.e., articulatory rate) was able to distinguish patients with bulbar ALS or FTD-PNFA from those with respiratory ALS or FTD-BV. Distinguishing the disordered groups proved challenging based on the pausing measures. Conclusions and Relevance This study demonstrated the use of speech measures in the identification of those with an oral-motor deficit, and showed the usefulness of performing a relatively simple reading test to assess speech versus pause behaviors across the ALS—FTD disease continuum. The findings also suggest that motor speech assessment should be performed as part of the diagnostic workup for patients with FTD. PMID:26789001
Speed-Accuracy Tradeoffs in Speech Production
2017-06-01
imaging data of speech production. A theoretical framework for considering Fitts’ law in the domain of speech production is elucidated. Methodological ...articulatory kinematics conform to Fitts’ law. A second, associated goal is to address the methodological challenges inherent in performing Fitts-style...analysis on rtMRI data of speech production. Methodological challenges include segmenting continuous speech into specific motor tasks, defining key
Rhetorical and Linguistic Analysis of Bush's Second Inaugural Speech
ERIC Educational Resources Information Center
Sameer, Imad Hayif
2017-01-01
This study attempts to analyze Bush's second inaugural speech. It aims at investigating the use of linguistic strategies in it. It resorts to two models which are Aristotle's model while the second is that of Atkinson's (1984) to draw the attention towards linguistic strategies. The analysis shows that Bush's second inaugural speech is successful…
The Effects of Macroglossia on Speech: A Case Study
ERIC Educational Resources Information Center
Mekonnen, Abebayehu Messele
2012-01-01
This article presents a case study of speech production in a 14-year-old Amharic-speaking boy. The boy had developed secondary macroglossia, related to a disturbance of growth hormones, following a history of normal speech development. Perceptual analysis combined with acoustic analysis and static palatography is used to investigate the specific…
A Fantasy Theme Analysis of Nixon's "Checkers" Speech.
ERIC Educational Resources Information Center
Wells, William T.
1996-01-01
Applies fantasy theme analysis to Richard Nixon's "Checkers" speech. States that three major themes emerge: Nixon as Moral Model, Nixon as the American Dream, and Nixon as Patriot. Points out that each issue responds to allegations of dishonesty that were leveled against him at the time. Argues that Nixon's speech was accepted and…
Articulatory Mediation of Speech Perception: A Causal Analysis of Multi-Modal Imaging Data
ERIC Educational Resources Information Center
Gow, David W., Jr.; Segawa, Jennifer A.
2009-01-01
The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analysis of high spatiotemporal resolution…
Martínez, Angela; Felizzola Donado, Carlos Alberto; Matallana Eslava, Diana Lucía
2015-01-01
Patients with schizophrenia and Frontotemporal Dementia (FTD) in their linguistic variants share some language characteristics such as the lexical access difficulties, disordered speech with disruptions, many pauses, interruptions and reformulations. For the schizophrenia patients it reflects a difficulty of affect expression, while for the FTD patients it reflects a linguistic issue. This study, through an analysis of a series of cases assessed Clinic both in memory and on the Mental Health Unit of HUSI-PUJ (Hospital Universitario San Ignacio), with additional language assessment (analysis speech and acoustic analysis), present distinctive features of the DFT in its linguistic variants and schizophrenia that will guide the specialist in finding early markers of a differential diagnosis. In patients with FTD language variants, in 100% of cases there is a difficulty understanding linguistic structure of complex type; and important speech fluency problems. In patients with schizophrenia, there are significant alterations in the expression of the suprasegmental elements of speech, as well as disruptions in discourse. We present how depth language assessment allows to reassess some of the rules for the speech and prosody analysis of patients with dementia and schizophrenia; we suggest how elements of speech are useful in guiding the diagnosis and correlate functional compromise in everyday psychiatrist's practice. Copyright © 2014 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.
Speech disorders did not correlate with age at onset of Parkinson's disease.
Dias, Alice Estevo; Barbosa, Maira Tonidandel; Limongi, João Carlos Papaterra; Barbosa, Egberto Reis
2016-02-01
Speech disorders are common manifestations of Parkinson´s disease. Objective To compare speech articulation in patients according to age at onset of the disease. Methods Fifty patients was divided into two groups: Group I consisted of 30 patients with age at onset between 40 and 55 years; Group II consisted of 20 patients with age at onset after 65 years. All patients were evaluated based on the Unified Parkinson's Disease Rating Scale scores, Hoehn and Yahr scale and speech evaluation by perceptual and acoustical analysis. Results There was no statistically significant difference between the two groups regarding neurological involvement and speech characteristics. Correlation analysis indicated differences in speech articulation in relation to staging and axial scores of rigidity and bradykinesia for middle and late-onset. Conclusions Impairment of speech articulation did not correlate with age at onset of disease, but was positively related with disease duration and higher scores in both groups.
Power Spectral Density Error Analysis of Spectral Subtraction Type of Speech Enhancement Methods
NASA Astrophysics Data System (ADS)
Händel, Peter
2006-12-01
A theoretical framework for analysis of speech enhancement algorithms is introduced for performance assessment of spectral subtraction type of methods. The quality of the enhanced speech is related to physical quantities of the speech and noise (such as stationarity time and spectral flatness), as well as to design variables of the noise suppressor. The derived theoretical results are compared with the outcome of subjective listening tests as well as successful design strategies, performed by independent research groups.
Speech in 10-Year-Olds Born With Cleft Lip and Palate: What Do Peers Say?
Nyberg, Jill; Havstam, Christina
2016-09-01
The aim of this study was to explore how 10-year-olds describe speech and communicative participation in children born with unilateral cleft lip and palate in their own words, whether they perceive signs of velopharyngeal insufficiency (VPI) and articulation errors of different degrees, and if so, which terminology they use. Methods/Participants: Nineteen 10-year-olds participated in three focus group interviews where they listened to 10 to 12 speech samples with different types of cleft speech characteristics assessed by speech and language pathologists (SLPs) and described what they heard. The interviews were transcribed and analyzed with qualitative content analysis. The analysis resulted in three interlinked categories encompassing different aspects of speech, personality, and social implications: descriptions of speech, thoughts on causes and consequences, and emotional reactions and associations. Each category contains four subcategories exemplified with quotes from the children's statements. More pronounced signs of VPI were perceived but referred to in terms relevant to 10-year-olds. Articulatory difficulties, even minor ones, were noted. Peers reflected on the risk to teasing and bullying and on how children with impaired speech might experience their situation. The SLPs and peers did not agree on minor signs of VPI, but they were unanimous in their analysis of clinically normal and more severely impaired speech. Articulatory impairments may be more important to treat than minor signs of VPI based on what peers say.
NASA Astrophysics Data System (ADS)
Yildirim, Serdar; Montanari, Simona; Andersen, Elaine; Narayanan, Shrikanth S.
2003-10-01
Understanding the fine details of children's speech and gestural characteristics helps, among other things, in creating natural computer interfaces. We analyze the acoustic, lexical/non-lexical and spoken/gestural discourse characteristics of young children's speech using audio-video data gathered using a Wizard of Oz technique from 4 to 6 year old children engaged in resolving a series of age-appropriate cognitive challenges. Fundamental and formant frequencies exhibited greater variations between subjects consistent with previous results on read speech [Lee et al., J. Acoust. Soc. Am. 105, 1455-1468 (1999)]. Also, our analysis showed that, in a given bandwidth, phonemic information contained in the speech of young child is significantly less than that of older ones and adults. To enable an integrated analysis, a multi-track annotation board was constructed using the ANVIL tool kit [M. Kipp, Eurospeech 1367-1370 (2001)]. Along with speech transcriptions and acoustic analysis, non-lexical and discourse characteristics, and child's gesture (facial expressions, body movements, hand/head movements) were annotated in a synchronized multilayer system. Initial results showed that younger children rely more on gestures to emphasize their verbal assertions. Younger children use non-lexical speech (e.g., um, huh) associated with frustration and pondering/reflecting more frequently than older ones. Younger children also repair more with humans than with computer.
Using Discursis to enhance the qualitative analysis of hospital pharmacist-patient interactions
Barras, Michael A.; Angus, Daniel J.
2018-01-01
Introduction Pharmacist-patient communication during medication counselling has been successfully investigated using Communication Accommodation Theory (CAT). Communication researchers in other healthcare professions have utilised Discursis software as an adjunct to their manual qualitative analysis processes. Discursis provides a visual, chronological representation of communication exchanges and identifies patterns of interactant engagement. Aim The aim of this study was to describe how Discursis software was used to enhance previously conducted qualitative analysis of pharmacist-patient interactions (by visualising pharmacist-patient speech patterns, episodes of engagement, and identifying CAT strategies employed by pharmacists within these episodes). Methods Visual plots from 48 transcribed audio recordings of pharmacist-patient exchanges were generated by Discursis. Representative plots were selected to show moderate-high and low- level speaker engagement. Details of engagement were investigated for pharmacist application of CAT strategies (approximation, interpretability, discourse management, emotional expression, and interpersonal control). Results Discursis plots allowed for identification of distinct patterns occurring within pharmacist-patient exchanges. Moderate-high pharmacist-patient engagement was characterised by multiple off-diagonal squares while alternating single coloured squares depicted low engagement. Engagement episodes were associated with multiple CAT strategies such as discourse management (open-ended questions). Patterns reflecting pharmacist or patient speaker dominance were dependant on clinical setting. Discussion and conclusions Discursis analysis of pharmacist-patient interactions, a novel application of the technology in health communication, was found to be an effective visualisation tool to pin-point episodes for CAT analysis. Discursis has numerous practical and theoretical applications for future health communication research and training. Researchers can use the software to support qualitative analysis where large data sets can be quickly reviewed to identify key areas for concentrated analysis. Because Discursis plots are easily generated from audio recorded transcripts, they are conducive as teaching tools for both students and practitioners to assess and develop their communication skills. PMID:29787568
Lohmander, A; Willadsen, E; Persson, C; Henningsson, G; Bowden, M; Hutters, B
2009-07-01
To present the methodology for speech assessment in the Scandcleft project and discuss issues from a pilot study. Description of methodology and blinded test for speech assessment. Speech samples and instructions for data collection and analysis for comparisons of speech outcomes across five included languages were developed and tested. PARTICIPANTS AND MATERIALS: Randomly selected video recordings of 10 5-year-old children from each language (n = 50) were included in the project. Speech material consisted of test consonants in single words, connected speech, and syllable chains with nasal consonants. Five experienced speech and language pathologists participated as observers. Narrow phonetic transcription of test consonants translated into cleft speech characteristics, ordinal scale rating of resonance, and perceived velopharyngeal closure (VPC). A velopharyngeal composite score (VPC-sum) was extrapolated from raw data. Intra-agreement comparisons were performed. Range for intra-agreement for consonant analysis was 53% to 89%, for hypernasality on high vowels in single words the range was 20% to 80%, and the agreement between the VPC-sum and the overall rating of VPC was 78%. Pooling data of speakers of different languages in the same trial and comparing speech outcome across trials seems possible if the assessment of speech concerns consonants and is confined to speech units that are phonetically similar across languages. Agreed conventions and rules are important. A composite variable for perceptual assessment of velopharyngeal function during speech seems usable; whereas, the method for hypernasality evaluation requires further testing.
Spectral analysis method and sample generation for real time visualization of speech
NASA Astrophysics Data System (ADS)
Hobohm, Klaus
A method for translating speech signals into optical models, characterized by high sound discrimination and learnability and designed to provide to deaf persons a feedback towards control of their way of speaking, is presented. Important properties of speech production and perception processes and organs involved in these mechanisms are recalled in order to define requirements for speech visualization. It is established that the spectral representation of time, frequency and amplitude resolution of hearing must be fair and continuous variations of acoustic parameters of speech signal must be depicted by a continuous variation of images. A color table was developed for dynamic illustration and sonograms were generated with five spectral analysis methods such as Fourier transformations and linear prediction coding. For evaluating sonogram quality, test persons had to recognize consonant/vocal/consonant words and an optimized analysis method was achieved with a fast Fourier transformation and a postprocessor. A hardware concept of a real time speech visualization system, based on multiprocessor technology in a personal computer, is presented.
Student Speech and the Internet: A Legal Analysis
ERIC Educational Resources Information Center
Graca, Thomas J.; Stader, David L.
2007-01-01
This article lays the foundation of American First Amendment jurisprudence in public schools and examines recent cases relating to student Internet speech. Particular emphasis is placed on the ability of schools to regulate student off-campus Internet speech. School authorities who wish to regulate nonthreatening off-campus speech in the…
The astronaut and the banana peel: An EVA retriever scenario
NASA Technical Reports Server (NTRS)
Shapiro, Daniel G.
1989-01-01
To prepare for the problem of accidents in Space Station activities, the Extravehicular Activity Retriever (EVAR) robot is being constructed, whose purpose is to retrieve astronauts and tools that float free of the Space Station. Advanced Decision Systems is at the beginning of a project to develop research software capable of guiding EVAR through the retrieval process. This involves addressing problems in machine vision, dexterous manipulation, real time construction of programs via speech input, and reactive execution of plans despite the mishaps and unexpected conditions that arise in uncontrolled domains. The problem analysis phase of this work is presented. An EVAR scenario is used to elucidate major domain and technical problems. An overview of the technical approach to prototyping an EVAR system is also presented.
ERIC Educational Resources Information Center
Alpermann, Anke; Huber, Walter; Natke, Ulrich; Willmes, Klaus
2010-01-01
Improved fluency after stuttering therapy is usually measured by the percentage of stuttered syllables. However, outcome studies rarely evaluate the use of trained speech patterns that speakers use to manage stuttering. This study investigated whether the modified time interval analysis can distinguish between trained speech patterns, fluent…
A Task Analysis for Teaching the Organization of an Informative Speech.
ERIC Educational Resources Information Center
Parks, Arlie Muller
The purpose of this paper is to demonstrate a task analysis of the objectives needed to organize an effective information-giving speech. A hierarchical structure of the behaviors needed to deliver a well-organized extemporaneous information-giving speech is presented, with some behaviors as subtasks for the unit objective and the others as…
Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar
2016-10-01
Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
Using Speech Recognition to Enhance the Tongue Drive System Functionality in Computer Access
Huo, Xueliang; Ghovanloo, Maysam
2013-01-01
Tongue Drive System (TDS) is a wireless tongue operated assistive technology (AT), which can enable people with severe physical disabilities to access computers and drive powered wheelchairs using their volitional tongue movements. TDS offers six discrete commands, simultaneously available to the users, for pointing and typing as a substitute for mouse and keyboard in computer access, respectively. To enhance the TDS performance in typing, we have added a microphone, an audio codec, and a wireless audio link to its readily available 3-axial magnetic sensor array, and combined it with a commercially available speech recognition software, the Dragon Naturally Speaking, which is regarded as one of the most efficient ways for text entry. Our preliminary evaluations indicate that the combined TDS and speech recognition technologies can provide end users with significantly higher performance than using each technology alone, particularly in completing tasks that require both pointing and text entry, such as web surfing. PMID:22255801
Open source OCR framework using mobile devices
NASA Astrophysics Data System (ADS)
Zhou, Steven Zhiying; Gilani, Syed Omer; Winkler, Stefan
2008-02-01
Mobile phones have evolved from passive one-to-one communication device to powerful handheld computing device. Today most new mobile phones are capable of capturing images, recording video, and browsing internet and do much more. Exciting new social applications are emerging on mobile landscape, like, business card readers, sing detectors and translators. These applications help people quickly gather the information in digital format and interpret them without the need of carrying laptops or tablet PCs. However with all these advancements we find very few open source software available for mobile phones. For instance currently there are many open source OCR engines for desktop platform but, to our knowledge, none are available on mobile platform. Keeping this in perspective we propose a complete text detection and recognition system with speech synthesis ability, using existing desktop technology. In this work we developed a complete OCR framework with subsystems from open source desktop community. This includes a popular open source OCR engine named Tesseract for text detection & recognition and Flite speech synthesis module, for adding text-to-speech ability.
The design of an adaptive predictive coder using a single-chip digital signal processor
NASA Astrophysics Data System (ADS)
Randolph, M. A.
1985-01-01
A speech coding processor architecture design study has been performed in which Texas Instruments TMS32010 has been selected from among three commercially available digital signal processing integrated circuits and evaluated in an implementation study of real-time Adaptive Predictive Coding (APC). The TMS32010 has been compared with AR&T Bell Laboratories DSP I and Nippon Electric Co. PD7720 and was found to be most suitable for a single chip implementation of APC. A preliminary design system based on TMS32010 has been performed, and several of the hardware and software design issues are discussed. Particular attention was paid to the design of an external memory controller which permits rapid sequential access of external RAM. As a result, it has been determined that a compact hardware implementation of the APC algorithm is feasible based of the TSM32010. Originator-supplied keywords include: vocoders, speech compression, adaptive predictive coding, digital signal processing microcomputers, speech processor architectures, and special purpose processor.
A phone-assistive device based on Bluetooth technology for cochlear implant users.
Qian, Haifeng; Loizou, Philipos C; Dorman, Michael F
2003-09-01
Hearing-impaired people, and particularly hearing-aid and cochlear-implant users, often have difficulty communicating over the telephone. The intelligibility of telephone speech is considerably lower than the intelligibility of face-to-face speech. This is partly because of lack of visual cues, limited telephone bandwidth, and background noise. In addition, cellphones may cause interference with the hearing aid or cochlear implant. To address these problems that hearing-impaired people experience with telephones, this paper proposes a wireless phone adapter that can be used to route the audio signal directly to the hearing aid or cochlear implant processor. This adapter is based on Bluetooth technology. The favorable features of this new wireless technology make the adapter superior to traditional assistive listening devices. A hardware prototype was built and software programs were written to implement the headset profile in the Bluetooth specification. Three cochlear implant users were tested with the proposed phone-adapter and reported good speech quality.
Speech outcomes in Cantonese patients after glossectomy.
Wong, Ripley Kit; Poon, Esther Sok-Man; Woo, Cynthia Yuen-Man; Chan, Sabina Ching-Shun; Wong, Elsa Siu-Ping; Chu, Ada Wai-Sze
2007-08-01
We sought to determine the major factors affecting speech production of Cantonese-speaking glossectomized patients. Error pattern was analyzed. Forty-one Cantonese-speaking subjects who had undergone glossectomy > or = 6 months previously were recruited. Speech production evaluation included (1) phonetic error analysis in nonsense syllable; (2) speech intelligibility in sentences evaluated by naive listeners; (3) overall speech intelligibility in conversation evaluated by experienced speech therapists. Patients receiving adjuvant radiotherapy had significantly poorer segmental and connected speech production. Total or subtotal glossectomy also resulted in poor speech outcomes. Patients having free flap reconstruction showed the best speech outcomes. Patients without lymph node metastasis had significantly better speech scores when compared with patients with lymph node metastasis. Initial consonant production had the worst scores, while vowel production was the least affected. Speech outcomes of Cantonese-speaking glossectomized patients depended on the severity of the disease. Initial consonants had the greatest effect on speech intelligibility.
Double Fourier analysis for Emotion Identification in Voiced Speech
NASA Astrophysics Data System (ADS)
Sierra-Sosa, D.; Bastidas, M.; Ortiz P., D.; Quintero, O. L.
2016-04-01
We propose a novel analysis alternative, based on two Fourier Transforms for emotion recognition from speech. Fourier analysis allows for display and synthesizes different signals, in terms of power spectral density distributions. A spectrogram of the voice signal is obtained performing a short time Fourier Transform with Gaussian windows, this spectrogram portraits frequency related features, such as vocal tract resonances and quasi-periodic excitations during voiced sounds. Emotions induce such characteristics in speech, which become apparent in spectrogram time-frequency distributions. Later, the signal time-frequency representation from spectrogram is considered an image, and processed through a 2-dimensional Fourier Transform in order to perform the spatial Fourier analysis from it. Finally features related with emotions in voiced speech are extracted and presented.
Kong, Anthony Pak-Hin; Law, Sam-Po; Kwan, Connie Ching-Yin; Lai, Christy; Lam, Vivian
2014-01-01
Gestures are commonly used together with spoken language in human communication. One major limitation of gesture investigations in the existing literature lies in the fact that the coding of forms and functions of gestures has not been clearly differentiated. This paper first described a recently developed Database of Speech and GEsture (DoSaGE) based on independent annotation of gesture forms and functions among 119 neurologically unimpaired right-handed native speakers of Cantonese (divided into three age and two education levels), and presented findings of an investigation examining how gesture use was related to age and linguistic performance. Consideration of these two factors, for which normative data are currently very limited or lacking in the literature, is relevant and necessary when one evaluates gesture employment among individuals with and without language impairment. Three speech tasks, including monologue of a personally important event, sequential description, and story-telling, were used for elicitation. The EUDICO Linguistic ANnotator (ELAN) software was used to independently annotate each participant’s linguistic information of the transcript, forms of gestures used, and the function for each gesture. About one-third of the subjects did not use any co-verbal gestures. While the majority of gestures were non-content-carrying, which functioned mainly for reinforcing speech intonation or controlling speech flow, the content-carrying ones were used to enhance speech content. Furthermore, individuals who are younger or linguistically more proficient tended to use fewer gestures, suggesting that normal speakers gesture differently as a function of age and linguistic performance. PMID:25667563
Comparison of Two Music Training Approaches on Music and Speech Perception in Cochlear Implant Users
Fuller, Christina D.; Galvin, John J.; Maat, Bert; Başkent, Deniz; Free, Rolien H.
2018-01-01
In normal-hearing (NH) adults, long-term music training may benefit music and speech perception, even when listening to spectro-temporally degraded signals as experienced by cochlear implant (CI) users. In this study, we compared two different music training approaches in CI users and their effects on speech and music perception, as it remains unclear which approach to music training might be best. The approaches differed in terms of music exercises and social interaction. For the pitch/timbre group, melodic contour identification (MCI) training was performed using computer software. For the music therapy group, training involved face-to-face group exercises (rhythm perception, musical speech perception, music perception, singing, vocal emotion identification, and music improvisation). For the control group, training involved group nonmusic activities (e.g., writing, cooking, and woodworking). Training consisted of weekly 2-hr sessions over a 6-week period. Speech intelligibility in quiet and noise, vocal emotion identification, MCI, and quality of life (QoL) were measured before and after training. The different training approaches appeared to offer different benefits for music and speech perception. Training effects were observed within-domain (better MCI performance for the pitch/timbre group), with little cross-domain transfer of music training (emotion identification significantly improved for the music therapy group). While training had no significant effect on QoL, the music therapy group reported better perceptual skills across training sessions. These results suggest that more extensive and intensive training approaches that combine pitch training with the social aspects of music therapy may further benefit CI users. PMID:29621947
Fuller, Christina D; Galvin, John J; Maat, Bert; Başkent, Deniz; Free, Rolien H
2018-01-01
In normal-hearing (NH) adults, long-term music training may benefit music and speech perception, even when listening to spectro-temporally degraded signals as experienced by cochlear implant (CI) users. In this study, we compared two different music training approaches in CI users and their effects on speech and music perception, as it remains unclear which approach to music training might be best. The approaches differed in terms of music exercises and social interaction. For the pitch/timbre group, melodic contour identification (MCI) training was performed using computer software. For the music therapy group, training involved face-to-face group exercises (rhythm perception, musical speech perception, music perception, singing, vocal emotion identification, and music improvisation). For the control group, training involved group nonmusic activities (e.g., writing, cooking, and woodworking). Training consisted of weekly 2-hr sessions over a 6-week period. Speech intelligibility in quiet and noise, vocal emotion identification, MCI, and quality of life (QoL) were measured before and after training. The different training approaches appeared to offer different benefits for music and speech perception. Training effects were observed within-domain (better MCI performance for the pitch/timbre group), with little cross-domain transfer of music training (emotion identification significantly improved for the music therapy group). While training had no significant effect on QoL, the music therapy group reported better perceptual skills across training sessions. These results suggest that more extensive and intensive training approaches that combine pitch training with the social aspects of music therapy may further benefit CI users.
Bertti, Poliana; Tejada, Julian; Martins, Ana Paula Pinheiro; Dal-Cól, Maria Luiza Cleto; Terra, Vera Cristina; de Oliveira, José Antônio Cortes; Velasco, Tonicarlo Rodrigues; Sakamoto, Américo Ceiki; Garcia-Cairasco, Norberto
2014-09-01
Epileptic syndromes and seizures are the expression of complex brain systems. Because no analysis of complexity has been applied to epileptic seizure semiology, our goal was to apply neuroethology and graph analysis to the study of the complexity of behavioral manifestations of epileptic seizures in human frontal lobe epilepsy (FLE) and temporal lobe epilepsy (TLE). We analyzed the video recordings of 120 seizures of 18 patients with FLE and 28 seizures of 28 patients with TLE. All patients were seizure-free >1 year after surgery (Engel Class I). All patients' behavioral sequences were analyzed by means of a glossary containing all behaviors and analyzed for neuroethology (Ethomatic software). The same series were used for graph analysis (CYTOSCAPE). Behaviors, displayed as nodes, were connected by edges to other nodes according to their temporal sequence of appearance. Using neuroethology analysis, we confirmed data in the literature such as in FLE: brief/frequent seizures, complex motor behaviors, head and eye version, unilateral/bilateral tonic posturing, speech arrest, vocalization, and rapid postictal recovery and in the case of TLE: presence of epigastric aura, lateralized dystonias, impairment of consciousness/speech during ictal and postictal periods, and development of secondary generalization. Using graph analysis metrics of FLE and TLE confirmed data from flowcharts. However, because of the algorithms we used, they highlighted more powerfully the connectivity and complex associations among behaviors in a quite selective manner, depending on the origin of the seizures. The algorithms we used are commonly employed to track brain connectivity from EEG and MRI sources, which makes our study very promising for future studies of complexity in this field. Copyright © 2014 Elsevier Inc. All rights reserved.
Identification of speech transients using variable frame rate analysis and wavelet packets.
Rasetshwane, Daniel M; Boston, J Robert; Li, Ching-Chung
2006-01-01
Speech transients are important cues for identifying and discriminating speech sounds. Yoo et al. and Tantibundhit et al. were successful in identifying speech transients and, emphasizing them, improving the intelligibility of speech in noise. However, their methods are computationally intensive and unsuitable for real-time applications. This paper presents a method to identify and emphasize speech transients that combines subband decomposition by the wavelet packet transform with variable frame rate (VFR) analysis and unvoiced consonant detection. The VFR analysis is applied to each wavelet packet to define a transitivity function that describes the extent to which the wavelet coefficients of that packet are changing. Unvoiced consonant detection is used to identify unvoiced consonant intervals and the transitivity function is amplified during these intervals. The wavelet coefficients are multiplied by the transitivity function for that packet, amplifying the coefficients localized at times when they are changing and attenuating coefficients at times when they are steady. Inverse transform of the modified wavelet packet coefficients produces a signal corresponding to speech transients similar to the transients identified by Yoo et al. and Tantibundhit et al. A preliminary implementation of the algorithm runs more efficiently.
Recovering With Acquired Apraxia of Speech: The First 2 Years.
Haley, Katarina L; Shafer, Jennifer N; Harmon, Tyson G; Jacks, Adam
2016-12-01
This study was intended to document speech recovery for 1 person with acquired apraxia of speech quantitatively and on the basis of her lived experience. The second author sustained a traumatic brain injury that resulted in acquired apraxia of speech. Over a 2-year period, she documented her recovery through 22 video-recorded monologues. We analyzed these monologues using a combination of auditory perceptual, acoustic, and qualitative methods. Recovery was evident for all quantitative variables examined. For speech sound production, the recovery was most prominent during the first 3 months, but slower improvement was evident for many months. Measures of speaking rate, fluency, and prosody changed more gradually throughout the entire period. A qualitative analysis of topics addressed in the monologues was consistent with the quantitative speech recovery and indicated a subjective dynamic relationship between accuracy and rate, an observation that several factors made speech sound production variable, and a persisting need for cognitive effort while speaking. Speech features improved over an extended time, but the recovery trajectories differed, indicating dynamic reorganization of the underlying speech production system. The relationship among speech dimensions should be examined in other cases and in population samples. The combination of quantitative and qualitative analysis methods offers advantages for understanding clinically relevant aspects of recovery.
Pathological speech signal analysis and classification using empirical mode decomposition.
Kaleem, Muhammad; Ghoraani, Behnaz; Guergachi, Aziz; Krishnan, Sridhar
2013-07-01
Automated classification of normal and pathological speech signals can provide an objective and accurate mechanism for pathological speech diagnosis, and is an active area of research. A large part of this research is based on analysis of acoustic measures extracted from sustained vowels. However, sustained vowels do not reflect real-world attributes of voice as effectively as continuous speech, which can take into account important attributes of speech such as rapid voice onset and termination, changes in voice frequency and amplitude, and sudden discontinuities in speech. This paper presents a methodology based on empirical mode decomposition (EMD) for classification of continuous normal and pathological speech signals obtained from a well-known database. EMD is used to decompose randomly chosen portions of speech signals into intrinsic mode functions, which are then analyzed to extract meaningful temporal and spectral features, including true instantaneous features which can capture discriminative information in signals hidden at local time-scales. A total of six features are extracted, and a linear classifier is used with the feature vector to classify continuous speech portions obtained from a database consisting of 51 normal and 161 pathological speakers. A classification accuracy of 95.7 % is obtained, thus demonstrating the effectiveness of the methodology.
Buklina, S B; Batalov, A I; Smirnov, A S; Poddubskaya, A A; Pitskhelauri, D I; Kobyakov, G L; Zhukov, V Yu; Goryaynov, S A; Kulikov, A S; Ogurtsova, A A; Golanov, A V; Varyukhina, M D; Pronin, I N
There are no studies on application of functional MRI (fMRI) for long-term monitoring of the condition of patients after resection of frontal and temporal lobe tumors. The study purpose was to correlate, using fMRI, reorganization of the speech system and dynamics of speech disorders in patients with left hemisphere gliomas before surgery and in the early and late postoperative periods. A total of 20 patients with left hemisphere gliomas were dynamically monitored using fMRI and comprehensive neuropsychological testing. The tumor was located in the frontal lobe in 12 patients and in the temporal lobe in 8 patients. Fifteen patients underwent primary surgery; 5 patients had repeated surgery. Sixteen patients had WHO Grade II and Grade III gliomas; the others had WHO Grade IV gliomas. Nineteen patients were examined preoperatively; 20 patients were examined at different times after surgery. Speech functions were assessed by a Luria's test; the dominant hand was determined using the Annette questionnaire; a family history of left-handedness was investigated. Functional MRI was performed on an HDtx 3.0 T scanner using BrainWavePA 2.0, Z software for fMRI data processing program for all calculations >7, p<0.001. In patients with extensive tumors and recurrent tumors, activation of right-sided homologues of the speech areas cold be detected even before surgery; but in most patients, the activation was detected 3 months or more after surgery. Therefore, reorganization of the speech system took time. Activation of right-sided homologues of the speech areas remained in all patients for up to a year. Simultaneous activation of right-sided homologues of both speech areas, the Broca's and Wernicke's areas, was detected more often in patients with frontal lobe tumors than in those with temporal lobe tumors. No additional activation foci in the left hemisphere were found at the thresholds used to process fMRI data. Recovery of the speech function, to a certain degree, occurred in all patients, but no clear correlation with fMRI data was found. Complex fMRI and neuropsychological studies in 20 patients after resection of frontal and temporal lobe tumors revealed individual features of speech system reorganization within one year follow-up. Probably, activation of right-sided homologues of the speech areas in the presence of left hemisphere tumors depends not only on the severity of speech disorder but also reflects individual involvement of the right hemisphere in enabling speech function. This is confirmed by right-sided activation, according to the fMRI data, in right-sided patients without aphasia and, conversely, the lack of activation of right-sided homologues of the speech areas in several patients with severe postoperative speech disorders during the entire follow-up period.
Zeremdini, Jihen; Ben Messaoud, Mohamed Anouar; Bouzid, Aicha
2015-09-01
Humans have the ability to easily separate a composed speech and to form perceptual representations of the constituent sources in an acoustic mixture thanks to their ears. Until recently, researchers attempt to build computer models of high-level functions of the auditory system. The problem of the composed speech segregation is still a very challenging problem for these researchers. In our case, we are interested in approaches that are addressed to the monaural speech segregation. For this purpose, we study in this paper the computational auditory scene analysis (CASA) to segregate speech from monaural mixtures. CASA is the reproduction of the source organization achieved by listeners. It is based on two main stages: segmentation and grouping. In this work, we have presented, and compared several studies that have used CASA for speech separation and recognition.
The Segmentation Problem in the Study of Impromptu Speech.
ERIC Educational Resources Information Center
Loman, Bengt
A fundamental problem in the study of spontaneous speech is how to segment it for analysis. The segments should be relevant for the study of linguistic structures, speech planning, speech production, or communication strategies. Operational rules for segmentation should consider a wide variety of criteria and be hierarchically ordered. This is…
Dimensions of Early Speech Sound Disorders: A Factor Analytic Study
ERIC Educational Resources Information Center
Lewis, Barbara A.; Freebairn, Lisa A.; Hansen, Amy J.; Stein, Catherine M.; Shriberg, Lawrence D.; Iyengar, Sudha K.; Taylor, H. Gerry
2006-01-01
The goal of this study was to classify children with speech sound disorders (SSD) empirically, using factor analytic techniques. Participants were 3-7-year olds enrolled in speech/language therapy (N=185). Factor analysis of an extensive battery of speech and language measures provided support for two distinct factors, representing the skill…
Breathing-Impaired Speech after Brain Haemorrhage: A Case Study
ERIC Educational Resources Information Center
Heselwood, Barry
2007-01-01
Results are presented from an auditory and acoustic analysis of the speech of an adult male with impaired prosody and articulation due to brain haemorrhage. They show marked effects on phonation, speech rate and articulator velocity, and a speech rhythm disrupted by "intrusive" stresses. These effects are discussed in relation to the speaker's…
ERIC Educational Resources Information Center
Lowit, Anja; Kuschmann, Anja
2012-01-01
Purpose: The autosegmental-metrical (AM) framework represents an established methodology for intonational analysis in unimpaired speaker populations but has found little application in describing intonation in motor speech disorders (MSDs). This study compared the intonation patterns of unimpaired participants (CON) and those with Parkinson's…
Design and performance of an analysis-by-synthesis class of predictive speech coders
NASA Technical Reports Server (NTRS)
Rose, Richard C.; Barnwell, Thomas P., III
1990-01-01
The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally. The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied. A general formulation for deriving the parametric representation used in all of the coders in the class is presented. A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general. The results of a study comparing the performances of different members of this class are presented. The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders. The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders.
An analysis of the masking of speech by competing speech using self-report data.
Agus, Trevor R; Akeroyd, Michael A; Noble, William; Bhullar, Navjot
2009-01-01
Many of the items in the "Speech, Spatial, and Qualities of Hearing" scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.
McKnight, Lindsay M; O'Malley-Keighran, Mary-Pat; Carroll, Clare
2016-11-01
There is evidence indicating that parent training programmes including interaction coaching of parents of children with autism spectrum disorders (ASD) can increase parental responsiveness, promote language development and social interaction skills in children with ASD. However, there is a lack of research exploring precisely how healthcare professionals use language in interaction coaching. To identify the speech acts of healthcare professionals during individual video-recorded interaction coaching sessions of a Hanen-influenced parent training programme with parents of children with ASD. This retrospective study used speech act analysis. Healthcare professional participants included two speech-language therapists and one occupational therapist. Sixteen videos were transcribed and a speech act analysis was conducted to identify the form and functions of the language used by the healthcare professionals. Descriptive statistics provided frequencies and percentages for the different speech acts used across the 16 videos. Six types of speech acts used by the healthcare professionals during coaching sessions were identified. These speech acts were, in order of frequency: Instructing, Modelling, Suggesting, Commanding, Commending and Affirming. The healthcare professionals were found to tailor their interaction coaching to the learning needs of the parents. A pattern was observed in which more direct speech acts were used in instances where indirect speech acts did not achieve the intended response. The study provides an insight into the nature of interaction coaching provided by healthcare professionals during a parent training programme. It identifies the types of language used during interaction coaching. It also highlights additional important aspects of interaction coaching such as the ability of healthcare professionals to adjust the directness of the coaching in order to achieve the intended parental response to the child's interaction. The findings may be used to increase the awareness of healthcare professionals about the types of speech acts used during interaction coaching as well as the manner in which coaching sessions are conducted. © 2016 Royal College of Speech and Language Therapists.
Incidence of speech recognition errors in the emergency department.
Goss, Foster R; Zhou, Li; Weiner, Scott G
2016-09-01
Physician use of computerized speech recognition (SR) technology has risen in recent years due to its ease of use and efficiency at the point of care. However, error rates between 10 and 23% have been observed, raising concern about the number of errors being entered into the permanent medical record, their impact on quality of care and medical liability that may arise. Our aim was to determine the incidence and types of SR errors introduced by this technology in the emergency department (ED). Level 1 emergency department with 42,000 visits/year in a tertiary academic teaching hospital. A random sample of 100 notes dictated by attending emergency physicians (EPs) using SR software was collected from the ED electronic health record between January and June 2012. Two board-certified EPs annotated the notes and conducted error analysis independently. An existing classification schema was adopted to classify errors into eight errors types. Critical errors deemed to potentially impact patient care were identified. There were 128 errors in total or 1.3 errors per note, and 14.8% (n=19) errors were judged to be critical. 71% of notes contained errors, and 15% contained one or more critical errors. Annunciation errors were the highest at 53.9% (n=69), followed by deletions at 18.0% (n=23) and added words at 11.7% (n=15). Nonsense errors, homonyms and spelling errors were present in 10.9% (n=14), 4.7% (n=6), and 0.8% (n=1) of notes, respectively. There were no suffix or dictionary errors. Inter-annotator agreement was 97.8%. This is the first estimate at classifying speech recognition errors in dictated emergency department notes. Speech recognition errors occur commonly with annunciation errors being the most frequent. Error rates were comparable if not lower than previous studies. 15% of errors were deemed critical, potentially leading to miscommunication that could affect patient care. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Core, Cynthia; Brown, Janean W; Larsen, Michael D; Mahshie, James
2014-01-01
The objectives of this research were to determine whether an adapted version of a Hybrid Visual Habituation procedure could be used to assess speech perception of phonetic and prosodic features of speech (vowel height, lexical stress, and intonation) in individual pre-school-age children who use cochlear implants. Nine children ranging in age from 3;4 to 5;5 participated in this study. Children were prelingually deaf and used cochlear implants and had no other known disabilities. Children received two speech feature tests using an adaptation of a Hybrid Visual Habituation procedure. Seven of the nine children demonstrated perception of at least one speech feature using this procedure using results from a Bayesian linear regression analysis. At least one child demonstrated perception of each speech feature using this assessment procedure. An adapted version of the Hybrid Visual Habituation Procedure with an appropriate statistical analysis provides a way to assess phonetic and prosodicaspects of speech in pre-school-age children who use cochlear implants.
Space station interior noise analysis program
NASA Technical Reports Server (NTRS)
Stusnick, E.; Burn, M.
1987-01-01
Documentation is provided for a microcomputer program which was developed to evaluate the effect of the vibroacoustic environment on speech communication inside a space station. The program, entitled Space Station Interior Noise Analysis Program (SSINAP), combines a Statistical Energy Analysis (SEA) prediction of sound and vibration levels within the space station with a speech intelligibility model based on the Modulation Transfer Function and the Speech Transmission Index (MTF/STI). The SEA model provides an effective analysis tool for predicting the acoustic environment based on proposed space station design. The MTF/STI model provides a method for evaluating speech communication in the relatively reverberant and potentially noisy environments that are likely to occur in space stations. The combinations of these two models provides a powerful analysis tool for optimizing the acoustic design of space stations from the point of view of speech communications. The mathematical algorithms used in SSINAP are presented to implement the SEA and MTF/STI models. An appendix provides an explanation of the operation of the program along with details of the program structure and code.
Rhythmic patterning in Malaysian and Singapore English.
Tan, Rachel Siew Kuang; Low, Ee-Ling
2014-06-01
Previous work on the rhythm of Malaysian English has been based on impressionistic observations. This paper utilizes acoustic analysis to measure the rhythmic patterns of Malaysian English. Recordings of the read speech and spontaneous speech of 10 Malaysian English speakers were analyzed and compared with recordings of an equivalent sample of Singaporean English speakers. Analysis was done using two rhythmic indexes, the PVI and VarcoV. It was found that although the rhythm of read speech of the Singaporean speakers was syllable-based as described by previous studies, the rhythm of the Malaysian speakers was even more syllable-based. Analysis of the syllables in specific utterances showed that Malaysian speakers did not reduce vowels as much as Singaporean speakers in cases of syllables in utterances. Results of the spontaneous speech confirmed the findings for the read speech; that is, the same rhythmic patterning was found which normally triggers vowel reductions.
Hlavnička, Jan; Čmejla, Roman; Tykalová, Tereza; Šonka, Karel; Růžička, Evžen; Rusz, Jan
2017-02-02
For generations, the evaluation of speech abnormalities in neurodegenerative disorders such as Parkinson's disease (PD) has been limited to perceptual tests or user-controlled laboratory analysis based upon rather small samples of human vocalizations. Our study introduces a fully automated method that yields significant features related to respiratory deficits, dysphonia, imprecise articulation and dysrhythmia from acoustic microphone data of natural connected speech for predicting early and distinctive patterns of neurodegeneration. We compared speech recordings of 50 subjects with rapid eye movement sleep behaviour disorder (RBD), 30 newly diagnosed, untreated PD patients and 50 healthy controls, and showed that subliminal parkinsonian speech deficits can be reliably captured even in RBD patients, which are at high risk of developing PD or other synucleinopathies. Thus, automated vocal analysis should soon be able to contribute to screening and diagnostic procedures for prodromal parkinsonian neurodegeneration in natural environments.
De Jonge-Hoekstra, Lisette; Van der Steen, Steffie; Van Geert, Paul; Cox, Ralf F A
2016-01-01
As children learn they use their speech to express words and their hands to gesture. This study investigates the interplay between real-time gestures and speech as children construct cognitive understanding during a hands-on science task. 12 children (M = 6, F = 6) from Kindergarten (n = 5) and first grade (n = 7) participated in this study. Each verbal utterance and gesture during the task were coded, on a complexity scale derived from dynamic skill theory. To explore the interplay between speech and gestures, we applied a cross recurrence quantification analysis (CRQA) to the two coupled time series of the skill levels of verbalizations and gestures. The analysis focused on (1) the temporal relation between gestures and speech, (2) the relative strength and direction of the interaction between gestures and speech, (3) the relative strength and direction between gestures and speech for different levels of understanding, and (4) relations between CRQA measures and other child characteristics. The results show that older and younger children differ in the (temporal) asymmetry in the gestures-speech interaction. For younger children, the balance leans more toward gestures leading speech in time, while the balance leans more toward speech leading gestures for older children. Secondly, at the group level, speech attracts gestures in a more dynamically stable fashion than vice versa, and this asymmetry in gestures and speech extends to lower and higher understanding levels. Yet, for older children, the mutual coupling between gestures and speech is more dynamically stable regarding the higher understanding levels. Gestures and speech are more synchronized in time as children are older. A higher score on schools' language tests is related to speech attracting gestures more rigidly and more asymmetry between gestures and speech, only for the less difficult understanding levels. A higher score on math or past science tasks is related to less asymmetry between gestures and speech. The picture that emerges from our analyses suggests that the relation between gestures, speech and cognition is more complex than previously thought. We suggest that temporal differences and asymmetry in influence between gestures and speech arise from simultaneous coordination of synergies.
Real time speech formant analyzer and display
Holland, George E.; Struve, Walter S.; Homer, John F.
1987-01-01
A speech analyzer for interpretation of sound includes a sound input which converts the sound into a signal representing the sound. The signal is passed through a plurality of frequency pass filters to derive a plurality of frequency formants. These formants are converted to voltage signals by frequency-to-voltage converters and then are prepared for visual display in continuous real time. Parameters from the inputted sound are also derived and displayed. The display may then be interpreted by the user. The preferred embodiment includes a microprocessor which is interfaced with a television set for displaying of the sound formants. The microprocessor software enables the sound analyzer to present a variety of display modes for interpretive and therapeutic used by the user.
The Effect of Background Traffic Packet Size to VoIP Speech Quality
NASA Astrophysics Data System (ADS)
Triyason, Tuul; Kanthamanon, Prasert; Warasup, Kittipong; Yamsaengsung, Siam; Supattatham, Montri
VoIP is gaining acceptance into the corporate world especially, in small and medium sized business that want to save cost for gaining advantage over their competitors. The good voice quality is one of challenging task in deployment plan because VoIP voice quality was affected by packet loss and jitter delay. In this paper, we study the effect of background traffic packet size to voice quality. The background traffic was generated by Bricks software and the speech quality was assessed by MOS. The obtained result shows an interesting relationship between the voice quality and the number of TCP packets and their size. With the same amount of data smaller packets affect the voice's quality more than the larger packet.
Real time speech formant analyzer and display
Holland, G.E.; Struve, W.S.; Homer, J.F.
1987-02-03
A speech analyzer for interpretation of sound includes a sound input which converts the sound into a signal representing the sound. The signal is passed through a plurality of frequency pass filters to derive a plurality of frequency formants. These formants are converted to voltage signals by frequency-to-voltage converters and then are prepared for visual display in continuous real time. Parameters from the inputted sound are also derived and displayed. The display may then be interpreted by the user. The preferred embodiment includes a microprocessor which is interfaced with a television set for displaying of the sound formants. The microprocessor software enables the sound analyzer to present a variety of display modes for interpretive and therapeutic used by the user. 19 figs.
Acoustic Analysis of the Voiced-Voiceless Distinction in Dutch Tracheoesophageal Speech
ERIC Educational Resources Information Center
Jongmans, Petra; Wempe, Ton G.; van Tinteren, Harm; Hilgers, Frans J. M.; Pols, Louis C. W.; van As-Brooks, Corina J.
2010-01-01
Purpose: Confusions between voiced and voiceless plosives and voiced and voiceless fricatives are common in Dutch tracheoesophageal (TE) speech. This study investigates (a) which acoustic measures are found to convey a correct voicing contrast in TE speech and (b) whether different measures are found in TE speech than in normal laryngeal (NL)…
ERIC Educational Resources Information Center
Haskins Labs., New Haven, CT.
This collection on speech research presents a number of reports of experiments conducted on neurological, physiological, and phonological questions, using electronic equipment for analysis. The neurological experiments cover auditory and phonetic processes in speech perception, auditory storage, ear asymmetry in dichotic listening, auditory…
Association of Orofacial Muscle Activity and Movement during Changes in Speech Rate and Intensity
ERIC Educational Resources Information Center
McClean, Michael D.; Tasko, Stephen M.
2003-01-01
Understanding how orofacial muscle activity and movement covary across changes in speech rate and intensity has implications for the neural control of speech production and the use of clinical procedures that manipulate speech prosody. The present study involved a correlation analysis relating average lower-lip and jaw-muscle activity to lip and…
Women's Speech/Men's Speech: Does Forensic Training Make a Difference?
ERIC Educational Resources Information Center
Larson, Suzanne; Vreeland, Amy L.
A study of cross examination speeches of males and females was conducted to determine gender differences in intercollegiate debate. The theory base for gender differences in speech is closely tied to the analysis of dyadic conversation. It is based on the belief that women are less forceful and dominant in cross examination, and will exhibit…
Intelligent acoustic data fusion technique for information security analysis
NASA Astrophysics Data System (ADS)
Jiang, Ying; Tang, Yize; Lu, Wenda; Wang, Zhongfeng; Wang, Zepeng; Zhang, Luming
2017-08-01
Tone is an essential component of word formation in all tonal languages, and it plays an important role in the transmission of information in speech communication. Therefore, tones characteristics study can be applied into security analysis of acoustic signal by the means of language identification, etc. In speech processing, fundamental frequency (F0) is often viewed as representing tones by researchers of speech synthesis. However, regular F0 values may lead to low naturalness in synthesized speech. Moreover, F0 and tone are not equivalent linguistically; F0 is just a representation of a tone. Therefore, the Electroglottography (EGG) signal is collected for deeper tones characteristics study. In this paper, focusing on the Northern Kam language, which has nine tonal contours and five level tone types, we first collected EGG and speech signals from six natural male speakers of the Northern Kam language, and then achieved the clustering distributions of the tone curves. After summarizing the main characteristics of tones of Northern Kam, we analyzed the relationship between EGG and speech signal parameters, and laid the foundation for further security analysis of acoustic signal.
[Effect of speech estimation on social anxiety].
Shirotsuki, Kentaro; Sasagawa, Satoko; Nomura, Shinobu
2009-02-01
This study investigates the effect of speech estimation on social anxiety to further understanding of this characteristic of Social Anxiety Disorder (SAD). In the first study, we developed the Speech Estimation Scale (SES) to assess negative estimation before giving a speech which has been reported to be the most fearful social situation in SAD. Undergraduate students (n = 306) completed a set of questionnaires, which consisted of the Short Fear of Negative Evaluation Scale (SFNE), the Social Interaction Anxiety Scale (SIAS), the Social Phobia Scale (SPS), and the SES. Exploratory factor analysis showed an adequate one-factor structure with eight items. Further analysis indicated that the SES had good reliability and validity. In the second study, undergraduate students (n = 315) completed the SFNE, SIAS, SPS, SES, and the Self-reported Depression Scale (SDS). The results of path analysis showed that fear of negative evaluation from others (FNE) predicted social anxiety, and speech estimation mediated the relationship between FNE and social anxiety. These results suggest that speech estimation might maintain SAD symptoms, and could be used as a specific target for cognitive intervention in SAD.
Speech transformations based on a sinusoidal representation
NASA Astrophysics Data System (ADS)
Quatieri, T. E.; McAulay, R. J.
1986-05-01
A new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformation including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency and rate of articulation. The method is based on a sinusoidal representation of the speech production mechanism that has been shown to produce synthetic speech that preserves the waveform shape and is essentially perceptually indistinguishable from the original. Although the analysis/synthesis system originally was designed for single-speaker signals, it is equally capable of recovering and modifying nonspeech signals such as music; multiple speakers, marine biologic sounds, and speakers in the presence of interferences such as noise and musical backgrounds.
Relationship between Speech Production and Perception in People Who Stutter.
Lu, Chunming; Long, Yuhang; Zheng, Lifen; Shi, Guang; Liu, Li; Ding, Guosheng; Howell, Peter
2016-01-01
Speech production difficulties are apparent in people who stutter (PWS). PWS also have difficulties in speech perception compared to controls. It is unclear whether the speech perception difficulties in PWS are independent of, or related to, their speech production difficulties. To investigate this issue, functional MRI data were collected on 13 PWS and 13 controls whilst the participants performed a speech production task and a speech perception task. PWS performed poorer than controls in the perception task and the poorer performance was associated with a functional activity difference in the left anterior insula (part of the speech motor area) compared to controls. PWS also showed a functional activity difference in this and the surrounding area [left inferior frontal cortex (IFC)/anterior insula] in the production task compared to controls. Conjunction analysis showed that the functional activity differences between PWS and controls in the left IFC/anterior insula coincided across the perception and production tasks. Furthermore, Granger Causality Analysis on the resting-state fMRI data of the participants showed that the causal connection from the left IFC/anterior insula to an area in the left primary auditory cortex (Heschl's gyrus) differed significantly between PWS and controls. The strength of this connection correlated significantly with performance in the perception task. These results suggest that speech perception difficulties in PWS are associated with anomalous functional activity in the speech motor area, and the altered functional connectivity from this area to the auditory area plays a role in the speech perception difficulties of PWS.
Comparison of formant detection methods used in speech processing applications
NASA Astrophysics Data System (ADS)
Belean, Bogdan
2013-11-01
The paper describes time frequency representations of speech signal together with the formant significance in speech processing applications. Speech formants can be used in emotion recognition, sex discrimination or diagnosing different neurological diseases. Taking into account the various applications of formant detection in speech signal, two methods for detecting formants are presented. First, the poles resulted after a complex analysis of LPC coefficients are used for formants detection. The second approach uses the Kalman filter for formant prediction along the speech signal. Results are presented for both approaches on real life speech spectrograms. A comparison regarding the features of the proposed methods is also performed, in order to establish which method is more suitable in case of different speech processing applications.
ERIC Educational Resources Information Center
Miyamoto, Karen A.
2005-01-01
A pretest-posttest experimental design was utilized to determine the efficacy of the Yuba Method on inaccurate elementary singers. Testing of pitch accuracy was analyzed using the Sona-Speech Model 3600 software program. Inaccurate singers (N=168) from a population of 320 fourth, fifth, and sixth grade students, were divided into three subgroups…
Using Speech Recognition Software to Improve Writing Skills
ERIC Educational Resources Information Center
Diaz, Felix
2014-01-01
Orthopedically impaired (OI) students face a formidable challenge during the writing process due to their limited or non-existing ability to use their hands to hold a pen or pencil or even to press the keys on a keyboard. While they may have a clear mental picture of what they want to write, the biggest hurdle comes well before having to tackle…
Federal Register 2010, 2011, 2012, 2013, 2014
2013-11-05
... with speech or hearing impairments may access this telephone number via TTY by calling the toll-free... management of the software, and data collection for Annual Performance Reports (APRs) and the Annual Homeless Assessment Report (AHAR). Information about HMIS is available at www.hud.gov and www.onecpd.info . The...
Assessment of Indoor Route-finding Technology for People with Visual Impairment
Kalia, Amy A.; Legge, Gordon E.; Roy, Rudrava; Ogale, Advait
2010-01-01
This study investigated navigation with route instructions generated by digital-map software and synthetic speech. Participants, either visually impaired or sighted wearing blind folds, successfully located rooms in an unfamiliar building. Users with visual impairment demonstrated better route-finding performance when the technology provided distance information in number of steps rather than walking time or number of feet. PMID:21869851
"Once upon a Time There Was a Mouse": Children's Technology-Mediated Storytelling in Preschool Class
ERIC Educational Resources Information Center
Skantz Åberg, Ewa; Lantz-Andersson, Annika; Pramling, Niklas
2014-01-01
With the current expansion of digital tools, the media used for narration is changing, challenging traditional literacies in educational settings. The present study explores what kind of activities emerge when six-year-old children in a preschool class write a digital story, using a word processor and speech-synthesised feedback computer software.…
ERIC Educational Resources Information Center
McCulley, Yvette K.
2012-01-01
The problem: The increasingly competitive global economy demands literate, educated workers. Both men and women experience the effects of education on employment rates and income. Racial and ethnic minorities, English language learners, and especially those with prison records are most deeply affected by the economic consequences of dropping out…
ERIC Educational Resources Information Center
Scott Instruments Corp., Denton, TX.
This project was designed to develop techniques for adding low-cost speech synthesis to educational software. Four tasks were identified for the study: (1) select a microcomputer with a built-in analog-to-digital converter that is currently being used in educational environments; (2) determine the feasibility of implementing expansion and playback…
ERIC Educational Resources Information Center
Overton, Sarah; Wren, Yvonne
2014-01-01
The ultimate aim of intervention for children with language impairment is an improvement in their functional language skills. Baseline and outcome measurement of this is often problematic however and practitioners commonly resort to using formal assessments that may not adequately reflect the child's competence. Language sampling,…
Speech graphs provide a quantitative measure of thought disorder in psychosis.
Mota, Natalia B; Vasconcelos, Nivaldo A P; Lemos, Nathalia; Pieretti, Ana C; Kinouchi, Osame; Cecchi, Guillermo A; Copelli, Mauro; Ribeiro, Sidarta
2012-01-01
Psychosis has various causes, including mania and schizophrenia. Since the differential diagnosis of psychosis is exclusively based on subjective assessments of oral interviews with patients, an objective quantification of the speech disturbances that characterize mania and schizophrenia is in order. In principle, such quantification could be achieved by the analysis of speech graphs. A graph represents a network with nodes connected by edges; in speech graphs, nodes correspond to words and edges correspond to semantic and grammatical relationships. To quantify speech differences related to psychosis, interviews with schizophrenics, manics and normal subjects were recorded and represented as graphs. Manics scored significantly higher than schizophrenics in ten graph measures. Psychopathological symptoms such as logorrhea, poor speech, and flight of thoughts were grasped by the analysis even when verbosity differences were discounted. Binary classifiers based on speech graph measures sorted schizophrenics from manics with up to 93.8% of sensitivity and 93.7% of specificity. In contrast, sorting based on the scores of two standard psychiatric scales (BPRS and PANSS) reached only 62.5% of sensitivity and specificity. The results demonstrate that alterations of the thought process manifested in the speech of psychotic patients can be objectively measured using graph-theoretical tools, developed to capture specific features of the normal and dysfunctional flow of thought, such as divergence and recurrence. The quantitative analysis of speech graphs is not redundant with standard psychometric scales but rather complementary, as it yields a very accurate sorting of schizophrenics and manics. Overall, the results point to automated psychiatric diagnosis based not on what is said, but on how it is said.
LaCroix, Arianna N; Diaz, Alvaro F; Rogalsky, Corianne
2015-01-01
The relationship between the neurobiology of speech and music has been investigated for more than a century. There remains no widespread agreement regarding how (or to what extent) music perception utilizes the neural circuitry that is engaged in speech processing, particularly at the cortical level. Prominent models such as Patel's Shared Syntactic Integration Resource Hypothesis (SSIRH) and Koelsch's neurocognitive model of music perception suggest a high degree of overlap, particularly in the frontal lobe, but also perhaps more distinct representations in the temporal lobe with hemispheric asymmetries. The present meta-analysis study used activation likelihood estimate analyses to identify the brain regions consistently activated for music as compared to speech across the functional neuroimaging (fMRI and PET) literature. Eighty music and 91 speech neuroimaging studies of healthy adult control subjects were analyzed. Peak activations reported in the music and speech studies were divided into four paradigm categories: passive listening, discrimination tasks, error/anomaly detection tasks and memory-related tasks. We then compared activation likelihood estimates within each category for music vs. speech, and each music condition with passive listening. We found that listening to music and to speech preferentially activate distinct temporo-parietal bilateral cortical networks. We also found music and speech to have shared resources in the left pars opercularis but speech-specific resources in the left pars triangularis. The extent to which music recruited speech-activated frontal resources was modulated by task. While there are certainly limitations to meta-analysis techniques particularly regarding sensitivity, this work suggests that the extent of shared resources between speech and music may be task-dependent and highlights the need to consider how task effects may be affecting conclusions regarding the neurobiology of speech and music.
The development and validation of the speech quality instrument.
Chen, Stephanie Y; Griffin, Brianna M; Mancuso, Dean; Shiau, Stephanie; DiMattia, Michelle; Cellum, Ilana; Harvey Boyd, Kelly; Prevoteau, Charlotte; Kohlberg, Gavriel D; Spitzer, Jaclyn B; Lalwani, Anil K
2017-12-08
Although speech perception tests are available to evaluate hearing, there is no standardized validated tool to quantify speech quality. The objective of this study is to develop a validated tool to measure quality of speech heard. Prospective instrument validation study of 35 normal hearing adults recruited at a tertiary referral center. Participants listened to 44 speech clips of male/female voices reciting the Rainbow Passage. Speech clips included original and manipulated excerpts capturing goal qualities such as mechanical and garbled. Listeners rated clips on a 10-point visual analog scale (VAS) of 18 characteristics (e.g. cartoonish, garbled). Skewed distribution analysis identified mean ratings in the upper and lower 2-point limits of the VAS (ratings of 8-10, 0-2, respectively); items with inconsistent responses were eliminated. The test was pruned to a final instrument of nine speech clips that clearly define qualities of interest: speech-like, male/female, cartoonish, echo-y, garbled, tinny, mechanical, rough, breathy, soothing, hoarse, like, pleasant, natural. Mean ratings were highest for original female clips (8.8) and lowest for not-speech manipulation (2.1). Factor analysis identified two subsets of characteristics: internal consistency demonstrated Cronbach's alpha of 0.95 and 0.82 per subset. Test-retest reliability of total scores was high, with an intraclass correlation coefficient of 0.76. The Speech Quality Instrument (SQI) is a concise, valid tool for assessing speech quality as an indicator for hearing performance. SQI may be a valuable outcome measure for cochlear implant recipients who, despite achieving excellent speech perception, often experience poor speech quality. 2b. Laryngoscope, 2017. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
LaCroix, Arianna N.; Diaz, Alvaro F.; Rogalsky, Corianne
2015-01-01
The relationship between the neurobiology of speech and music has been investigated for more than a century. There remains no widespread agreement regarding how (or to what extent) music perception utilizes the neural circuitry that is engaged in speech processing, particularly at the cortical level. Prominent models such as Patel's Shared Syntactic Integration Resource Hypothesis (SSIRH) and Koelsch's neurocognitive model of music perception suggest a high degree of overlap, particularly in the frontal lobe, but also perhaps more distinct representations in the temporal lobe with hemispheric asymmetries. The present meta-analysis study used activation likelihood estimate analyses to identify the brain regions consistently activated for music as compared to speech across the functional neuroimaging (fMRI and PET) literature. Eighty music and 91 speech neuroimaging studies of healthy adult control subjects were analyzed. Peak activations reported in the music and speech studies were divided into four paradigm categories: passive listening, discrimination tasks, error/anomaly detection tasks and memory-related tasks. We then compared activation likelihood estimates within each category for music vs. speech, and each music condition with passive listening. We found that listening to music and to speech preferentially activate distinct temporo-parietal bilateral cortical networks. We also found music and speech to have shared resources in the left pars opercularis but speech-specific resources in the left pars triangularis. The extent to which music recruited speech-activated frontal resources was modulated by task. While there are certainly limitations to meta-analysis techniques particularly regarding sensitivity, this work suggests that the extent of shared resources between speech and music may be task-dependent and highlights the need to consider how task effects may be affecting conclusions regarding the neurobiology of speech and music. PMID:26321976
Halim, Zahid; Abbas, Ghulam
2015-01-01
Sign language provides hearing and speech impaired individuals with an interface to communicate with other members of the society. Unfortunately, sign language is not understood by most of the common people. For this, a gadget based on image processing and pattern recognition can provide with a vital aid for detecting and translating sign language into a vocal language. This work presents a system for detecting and understanding the sign language gestures by a custom built software tool and later translating the gesture into a vocal language. For the purpose of recognizing a particular gesture, the system employs a Dynamic Time Warping (DTW) algorithm and an off-the-shelf software tool is employed for vocal language generation. Microsoft(®) Kinect is the primary tool used to capture video stream of a user. The proposed method is capable of successfully detecting gestures stored in the dictionary with an accuracy of 91%. The proposed system has the ability to define and add custom made gestures. Based on an experiment in which 10 individuals with impairments used the system to communicate with 5 people with no disability, 87% agreed that the system was useful.
Evaluation of the 'Fitting to Outcomes eXpert' (FOX®) with established cochlear implant users.
Buechner, Andreas; Vaerenberg, Bart; Gazibegovic, Dzemal; Brendel, Martina; De Ceulaer, Geert; Govaerts, Paul; Lenarz, Thomas
2015-01-01
To evaluate the possible impact of 'Fitting to Outcomes eXpert (FOX(®))' on cochlear implant (CI) fitting in a clinic with extensive experience of fitting a range of CI systems, as a way to assess whether a software tool such as FOX is able to complement standard clinical procedures. Ten adult post-lingually deafened and unilateral long-term users of the Advanced Bionics(TM) CI system (Clarion CII or HiRes 90K(TM)) underwent speech perception assessment with their current clinical program. One cycle 'iteration' of FOX optimization was performed and the program adjusted accordingly. After a month of using both clinical and FOX programs, a second iteration of FOX optimization was performed. Following this, the assessments were repeated without further acclimatization. FOX prescribed programming modifications in all subjects. Soundfield-aided thresholds were significantly lower for FOX than the clinical program. Group speech scores in noise were not significantly different between the two programs but three individual subjects had improved speech scores with the FOX MAP, two had worse speech scores, and five were the same. FOX provided a standardized approach to fitting based on outcome measures rather than comfort alone. The results indicated that for this group of well-fitted patients, FOX improved outcomes in some individuals. There were significant changes, both better and worse, in individual speech perception scores but median scores remained unchanged. Soundfield-aided thresholds were significantly improved for the FOX group.
Elfmarková, Nela; Gajdoš, Martin; Mračková, Martina; Mekyska, Jiří; Mikl, Michal; Rektorová, Irena
2016-01-01
Impaired speech prosody is common in Parkinson's disease (PD). We assessed the impact of PD and levodopa on MRI resting-state functional connectivity (rs-FC) underlying speech prosody control. We studied 19 PD patients in the OFF and ON dopaminergic conditions and 15 age-matched healthy controls using functional MRI and seed partial least squares correlation (PLSC) analysis. In the PD group, we also correlated levodopa-induced rs-FC changes with the results of acoustic analysis. The PLCS analysis revealed a significant impact of PD but not of medication on the rs-FC strength of spatial correlation maps seeded by the anterior cingulate (p = 0.006), the right orofacial primary sensorimotor cortex (OF_SM1; p = 0.025) and the right caudate head (CN; p = 0.047). In the PD group, levodopa-induced changes in the CN and OF_SM1 connectivity strengths were related to changes in speech prosody. We demonstrated an impact of PD but not of levodopa on rs-FC within the brain networks related to speech prosody control. When only the PD patients were taken into account, the association between treatment-induced changes in speech prosody and changes in rs-FC within the associative striato-prefrontal and motor speech networks was found. Copyright © 2015 Elsevier Ltd. All rights reserved.
Detection of target phonemes in spontaneous and read speech.
Mehta, G; Cutler, A
1988-01-01
Although spontaneous speech occurs more frequently in most listeners' experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ considerably, however, which suggests that laboratory results may not generalise to the recognition of spontaneous speech. In the present study listeners were presented with both spontaneous and read speech materials, and their response time to detect word-initial target phonemes was measured. Responses were, overall, equally fast in each speech mode. However, analysis of effects previously reported in phoneme detection studies revealed significant differences between speech modes. In read speech but not in spontaneous speech, later targets were detected more rapidly than targets preceded by short words. In contrast, in spontaneous speech but not in read speech, targets were detected more rapidly in accented than in unaccented words and in strong than in weak syllables. An explanation for this pattern is offered in terms of characteristic prosodic differences between spontaneous and read speech. The results support claims from previous work that listeners pay great attention to prosodic information in the process of recognising speech.
NASA Astrophysics Data System (ADS)
Vassiliou, Marius S.; Sundareswaran, Venkataraman; Chen, S.; Behringer, Reinhold; Tam, Clement K.; Chan, M.; Bangayan, Phil T.; McGee, Joshua H.
2000-08-01
We describe new systems for improved integrated multimodal human-computer interaction and augmented reality for a diverse array of applications, including future advanced cockpits, tactical operations centers, and others. We have developed an integrated display system featuring: speech recognition of multiple concurrent users equipped with both standard air- coupled microphones and novel throat-coupled sensors (developed at Army Research Labs for increased noise immunity); lip reading for improving speech recognition accuracy in noisy environments, three-dimensional spatialized audio for improved display of warnings, alerts, and other information; wireless, coordinated handheld-PC control of a large display; real-time display of data and inferences from wireless integrated networked sensors with on-board signal processing and discrimination; gesture control with disambiguated point-and-speak capability; head- and eye- tracking coupled with speech recognition for 'look-and-speak' interaction; and integrated tetherless augmented reality on a wearable computer. The various interaction modalities (speech recognition, 3D audio, eyetracking, etc.) are implemented a 'modality servers' in an Internet-based client-server architecture. Each modality server encapsulates and exposes commercial and research software packages, presenting a socket network interface that is abstracted to a high-level interface, minimizing both vendor dependencies and required changes on the client side as the server's technology improves.
Pilot Workload and Speech Analysis: A Preliminary Investigation
NASA Technical Reports Server (NTRS)
Bittner, Rachel M.; Begault, Durand R.; Christopher, Bonny R.
2013-01-01
Prior research has questioned the effectiveness of speech analysis to measure the stress, workload, truthfulness, or emotional state of a talker. The question remains regarding the utility of speech analysis for restricted vocabularies such as those used in aviation communications. A part-task experiment was conducted in which participants performed Air Traffic Control read-backs in different workload environments. Participant's subjective workload and the speech qualities of fundamental frequency (F0) and articulation rate were evaluated. A significant increase in subjective workload rating was found for high workload segments. F0 was found to be significantly higher during high workload while articulation rates were found to be significantly slower. No correlation was found to exist between subjective workload and F0 or articulation rate.
Humes, Larry E.; Kidd, Gary R.; Lentz, Jennifer J.
2013-01-01
This study was designed to address individual differences in aided speech understanding among a relatively large group of older adults. The group of older adults consisted of 98 adults (50 female and 48 male) ranging in age from 60 to 86 (mean = 69.2). Hearing loss was typical for this age group and about 90% had not worn hearing aids. All subjects completed a battery of tests, including cognitive (6 measures), psychophysical (17 measures), and speech-understanding (9 measures), as well as the Speech, Spatial, and Qualities of Hearing (SSQ) self-report scale. Most of the speech-understanding measures made use of competing speech and the non-speech psychophysical measures were designed to tap phenomena thought to be relevant for the perception of speech in competing speech (e.g., stream segregation, modulation-detection interference). All measures of speech understanding were administered with spectral shaping applied to the speech stimuli to fully restore audibility through at least 4000 Hz. The measures used were demonstrated to be reliable in older adults and, when compared to a reference group of 28 young normal-hearing adults, age-group differences were observed on many of the measures. Principal-components factor analysis was applied successfully to reduce the number of independent and dependent (speech understanding) measures for a multiple-regression analysis. Doing so yielded one global cognitive-processing factor and five non-speech psychoacoustic factors (hearing loss, dichotic signal detection, multi-burst masking, stream segregation, and modulation detection) as potential predictors. To this set of six potential predictor variables were added subject age, Environmental Sound Identification (ESI), and performance on the text-recognition-threshold (TRT) task (a visual analog of interrupted speech recognition). These variables were used to successfully predict one global aided speech-understanding factor, accounting for about 60% of the variance. PMID:24098273
ERIC Educational Resources Information Center
Code, Chris; Tree, Jeremy; Ball, Martin
2011-01-01
We describe an analysis of speech errors on a confrontation naming task in a man with progressive speech degeneration of 10-year duration from Pick's disease. C.S. had a progressive non-fluent aphasia together with a motor speech impairment and early assessment indicated some naming impairments. There was also an absence of significant…
Improved Open-Microphone Speech Recognition
NASA Astrophysics Data System (ADS)
Abrash, Victor
2002-12-01
Many current and future NASA missions make extreme demands on mission personnel both in terms of work load and in performing under difficult environmental conditions. In situations where hands are impeded or needed for other tasks, eyes are busy attending to the environment, or tasks are sufficiently complex that ease of use of the interface becomes critical, spoken natural language dialog systems offer unique input and output modalities that can improve efficiency and safety. They also offer new capabilities that would not otherwise be available. For example, many NASA applications require astronauts to use computers in micro-gravity or while wearing space suits. Under these circumstances, command and control systems that allow users to issue commands or enter data in hands-and eyes-busy situations become critical. Speech recognition technology designed for current commercial applications limits the performance of the open-ended state-of-the-art dialog systems being developed at NASA. For example, today's recognition systems typically listen to user input only during short segments of the dialog, and user input outside of these short time windows is lost. Mistakes detecting the start and end times of user utterances can lead to mistakes in the recognition output, and the dialog system as a whole has no way to recover from this, or any other, recognition error. Systems also often require the user to signal when that user is going to speak, which is impractical in a hands-free environment, or only allow a system-initiated dialog requiring the user to speak immediately following a system prompt. In this project, SRI has developed software to enable speech recognition in a hands-free, open-microphone environment, eliminating the need for a push-to-talk button or other signaling mechanism. The software continuously captures a user's speech and makes it available to one or more recognizers. By constantly monitoring and storing the audio stream, it provides the spoken dialog manager extra flexibility to recognize the signal with no audio gaps between recognition requests, as well as to rerecognize portions of the signal, or to rerecognize speech with different grammars, acoustic models, recognizers, start times, and so on. SRI expects that this new open-mic functionality will enable NASA to develop better error-correction mechanisms for spoken dialog systems, and may also enable new interaction strategies.
Improved Open-Microphone Speech Recognition
NASA Technical Reports Server (NTRS)
Abrash, Victor
2002-01-01
Many current and future NASA missions make extreme demands on mission personnel both in terms of work load and in performing under difficult environmental conditions. In situations where hands are impeded or needed for other tasks, eyes are busy attending to the environment, or tasks are sufficiently complex that ease of use of the interface becomes critical, spoken natural language dialog systems offer unique input and output modalities that can improve efficiency and safety. They also offer new capabilities that would not otherwise be available. For example, many NASA applications require astronauts to use computers in micro-gravity or while wearing space suits. Under these circumstances, command and control systems that allow users to issue commands or enter data in hands-and eyes-busy situations become critical. Speech recognition technology designed for current commercial applications limits the performance of the open-ended state-of-the-art dialog systems being developed at NASA. For example, today's recognition systems typically listen to user input only during short segments of the dialog, and user input outside of these short time windows is lost. Mistakes detecting the start and end times of user utterances can lead to mistakes in the recognition output, and the dialog system as a whole has no way to recover from this, or any other, recognition error. Systems also often require the user to signal when that user is going to speak, which is impractical in a hands-free environment, or only allow a system-initiated dialog requiring the user to speak immediately following a system prompt. In this project, SRI has developed software to enable speech recognition in a hands-free, open-microphone environment, eliminating the need for a push-to-talk button or other signaling mechanism. The software continuously captures a user's speech and makes it available to one or more recognizers. By constantly monitoring and storing the audio stream, it provides the spoken dialog manager extra flexibility to recognize the signal with no audio gaps between recognition requests, as well as to rerecognize portions of the signal, or to rerecognize speech with different grammars, acoustic models, recognizers, start times, and so on. SRI expects that this new open-mic functionality will enable NASA to develop better error-correction mechanisms for spoken dialog systems, and may also enable new interaction strategies.
Analysis of Factors Affecting System Performance in the ASpIRE Challenge
2015-12-13
performance in the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. In particular, overall word error rate (WER) of the solver...systems is analyzed as a function of room, distance between talker and microphone, and microphone type. We also analyze speech activity detection...analysis will inform the design of future challenges and provide insight into the efficacy of current solutions addressing noisy reverberant speech
Wang, Kun-Ching
2015-01-14
The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech.
NASA Technical Reports Server (NTRS)
Sandor, A.; Moses, H. R.
2016-01-01
Currently on the International Space Station (ISS) and other space vehicles Caution & Warning (C&W) alerts are represented with various auditory tones that correspond to the type of event. This system relies on the crew's ability to remember what each tone represents in a high stress, high workload environment when responding to the alert. Furthermore, crew receive a year or more in advance of the mission that makes remembering the semantic meaning of the alerts more difficult. The current system works for missions conducted close to Earth where ground operators can assist as needed. On long duration missions, however, they will need to work off-nominal events autonomously. There is evidence that speech alarms may be easier and faster to recognize, especially during an off-nominal event. The Information Presentation Directed Research Project (FY07-FY09) funded by the Human Research Program included several studies investigating C&W alerts. The studies evaluated tone alerts currently in use with NASA flight deck displays along with candidate speech alerts. A follow-on study used four types of speech alerts to investigate how quickly various types of auditory alerts with and without a speech component - either at the beginning or at the end of the tone - can be identified. Even though crew were familiar with the tone alert from training or direct mission experience, alerts starting with a speech component were identified faster than alerts starting with a tone. The current study replicated the results from the previous study in a more rigorous experimental design to determine if the candidate speech alarms are ready for transition to operations or if more research is needed. Four types of alarms (caution, warning, fire, and depressurization) were presented to participants in both tone and speech formats in laboratory settings and later in the Human Exploration Research Analog (HERA). In the laboratory study, the alerts were presented by software and participants were asked to identify the alert as quickly and as accurately as possible. Reaction time and accuracy were measured. Participants identified speech alerts significantly faster than tone alerts. The HERA study investigated the performance of participants in a flight-like environment. Participants were instructed to complete items on a task list and respond to C&W alerts as they occurred. Reaction time and accuracy were measured to determine if the benefits of speech alarms are still present in an applied setting.
Speech Therapy Telepractice for Vocal Cord Dysfunction (VCD): MaineCare (Medicaid) Cost Savings
Towey, Michael P.
2012-01-01
This Brief Communication represents an analysis of the cost savings to MaineCare (also referred to as Medicaid) directly attributable to service provided via speech therapy telepractice. Seven female (primarily adolescent) MaineCare patients consecutively referred to Waldo County General Hospital (WCGH) with suspected diagnosis of Vocal Cord Dysfunction (VCD) were treated by speech therapy telepractice. Outcome data demonstrated a first month cost savings of $2376.72. The analysis additionally projected thousands of dollars of potential savings each month in reduced medical costs for this patient group as a result of successful treatment via speech therapy telepractice. The study suggests that without access to speech therapy telepractice for patients with VCD, the medical costs to MaineCare will be ongoing and significant. PMID:25945195
Fluency variation in adolescents.
Furquim de Andrade, Claudia Regina; de Oliveira Martins, Vanessa
2007-10-01
The Speech Fluency Profile of fluent adolescent speakers of Brazilian Portuguese, were examined with respect to gender and neurolinguistic variations. Speech samples of 130 male and female adolescents, aged between 12;0 and 17;11 years were gathered. They were analysed according to type of speech disruption; speech rate; and frequency of speech disruptions. Statistical analysis did not find significant differences between genders for the variables studied. However, regarding the phases of adolescence (early: 12;0-14;11 years; late: 15;0-17;11 years), statistical differences were observed for all of the variables. As for neurolinguistic maturation, a decrease in the number of speech disruptions and an increase in speech rate occurred during the final phase of adolescence, indicating that the maturation of the motor and linguistic processes exerted an influence over the fluency profile of speech.
An experiment with spectral analysis of emotional speech affected by orthodontic appliances
NASA Astrophysics Data System (ADS)
Přibil, Jiří; Přibilová, Anna; Ďuračková, Daniela
2012-11-01
The contribution describes the effect of the fixed and removable orthodontic appliances on spectral properties of emotional speech. Spectral changes were analyzed and evaluated by spectrograms and mean Welch’s periodograms. This alternative approach to the standard listening test enables to obtain objective comparison based on statistical analysis by ANOVA and hypothesis tests. Obtained results of analysis performed on short sentences of a female speaker in four emotional states (joyous, sad, angry, and neutral) show that, first of all, the removable orthodontic appliance affects the spectrograms of produced speech.
Correlational Analysis of Speech Intelligibility Tests and Metrics for Speech Transmission
2017-12-04
frequency scale (male voice; normal voice effort) ............................... 4 Fig. 2 Diagram of a speech communication system (Letowski...languages. Consonants contain mostly high frequency (above 1500 Hz) speech energy, but this energy is relatively small in comparison to that of the whole...voices (Letowski et al. 1993). Since the mid- frequency spectral region contains mostly vowel energy while consonants are high frequency sounds, an
Reporting and Reacting: Concurrent Responses to Reported Speech.
ERIC Educational Resources Information Center
Holt, Elizabeth
2000-01-01
Uses conversation analysis to investigate reported speech in talk-in-interaction. Beginning with an examination of direct and indirect reported speech, the article highlights some of the design features of the former, and the sequential environments in which it occurs. (Author/VWL)
Symbolic Martyrdom: The Ultimate Apology.
ERIC Educational Resources Information Center
Burkholder, Thomas R.
1991-01-01
Examines the claim that "scaffold speeches" (speeches by individuals awaiting execution) form a discrete genre. Argues that they constitute a subgenre within the larger genre of apologia. Illustrates the subgenre through analysis of John Brown's final speech at his trial following the Harper's Ferry raid. (SR)
Relationship between Speech Production and Perception in People Who Stutter
Lu, Chunming; Long, Yuhang; Zheng, Lifen; Shi, Guang; Liu, Li; Ding, Guosheng; Howell, Peter
2016-01-01
Speech production difficulties are apparent in people who stutter (PWS). PWS also have difficulties in speech perception compared to controls. It is unclear whether the speech perception difficulties in PWS are independent of, or related to, their speech production difficulties. To investigate this issue, functional MRI data were collected on 13 PWS and 13 controls whilst the participants performed a speech production task and a speech perception task. PWS performed poorer than controls in the perception task and the poorer performance was associated with a functional activity difference in the left anterior insula (part of the speech motor area) compared to controls. PWS also showed a functional activity difference in this and the surrounding area [left inferior frontal cortex (IFC)/anterior insula] in the production task compared to controls. Conjunction analysis showed that the functional activity differences between PWS and controls in the left IFC/anterior insula coincided across the perception and production tasks. Furthermore, Granger Causality Analysis on the resting-state fMRI data of the participants showed that the causal connection from the left IFC/anterior insula to an area in the left primary auditory cortex (Heschl’s gyrus) differed significantly between PWS and controls. The strength of this connection correlated significantly with performance in the perception task. These results suggest that speech perception difficulties in PWS are associated with anomalous functional activity in the speech motor area, and the altered functional connectivity from this area to the auditory area plays a role in the speech perception difficulties of PWS. PMID:27242487
Two different phenomena in basic motor speech performance in premanifest Huntington disease.
Skodda, Sabine; Grönheit, Wenke; Lukas, Carsten; Bellenberg, Barbara; von Hein, Sarah M; Hoffmann, Rainer; Saft, Carsten
2016-03-09
Dysarthria is a common feature in Huntington disease (HD). The aim of this cross-sectional pilot study was the description and objective analysis of different speech parameters with special emphasis on the aspect of speech timing of connected speech and nonspeech verbal utterances in premanifest HD (preHD). A total of 28 preHD mutation carriers and 28 age- and sex-matched healthy speakers had to perform a reading task and several syllable repetition tasks. Results of computerized acoustic analysis of different variables for the measurement of speech rate and regularity were correlated with clinical measures and MRI-based brain atrophy assessment by voxel-based morphometry. An impaired capacity to steadily repeat single syllables with higher variations in preHD compared to healthy controls was found (variance 1: Cohen d = 1.46). Notably, speech rate was increased compared to controls and showed correlations to the volume of certain brain areas known to be involved in the sensory-motor speech networks (net speech rate: Cohen d = 1.19). Furthermore, speech rate showed correlations to disease burden score, probability of disease onset, the estimated years to onset, and clinical measures like the cognitive score. Measurement of speech rate and regularity might be helpful additional tools for the monitoring of subclinical functional disability in preHD. As one of the possible causes for higher performance in preHD, we discuss huntingtin-dependent temporarily advantageous development processes of the brain. © 2016 American Academy of Neurology.
Speech perception and production in severe environments
NASA Astrophysics Data System (ADS)
Pisoni, David B.
1990-09-01
The goal was to acquire new knowledge about speech perception and production in severe environments such as high masking noise, increased cognitive load or sustained attentional demands. Changes were examined in speech production under these adverse conditions through acoustic analysis techniques. One set of studies focused on the effects of noise on speech production. The experiments in this group were designed to generate a database of speech obtained in noise and in quiet. A second set of experiments was designed to examine the effects of cognitive load on the acoustic-phonetic properties of speech. Talkers were required to carry out a demanding perceptual motor task while they read lists of test words. A final set of experiments explored the effects of vocal fatigue on the acoustic-phonetic properties of speech. Both cognitive load and vocal fatigue are present in many applications where speech recognition technology is used, yet their influence on speech production is poorly understood.
Clear speech and lexical competition in younger and older adult listeners.
Van Engen, Kristin J
2017-08-01
This study investigated whether clear speech reduces the cognitive demands of lexical competition by crossing speaking style with lexical difficulty. Younger and older adults identified more words in clear versus conversational speech and more easy words than hard words. An initial analysis suggested that the effect of lexical difficulty was reduced in clear speech, but more detailed analyses within each age group showed this interaction was significant only for older adults. The results also showed that both groups improved over the course of the task and that clear speech was particularly helpful for individuals with poorer hearing: for younger adults, clear speech eliminated hearing-related differences that affected performance on conversational speech. For older adults, clear speech was generally more helpful to listeners with poorer hearing. These results suggest that clear speech affords perceptual benefits to all listeners and, for older adults, mitigates the cognitive challenge associated with identifying words with many phonological neighbors.
[Acupuncture for aphasia: a retrospective analysis of clinical literature].
Tan, Jie; Zhang, Hong; Han, Guodong; Ai, Kun; Deng, Shifeng
2016-04-01
With the Meta-analysis method, the clinical efficacy of acupuncture and other regular methods for aphasia was evaluated, and the acupoints selection for aphasia was explored. The acupuncture literature of clinical randomized control trials for aphasia published in CNKI, WANFANG, VIP and CBM database was searched; the statistical analysis of clinical efficacy of acupuncture and other regular methods for aphasia was performed by using software Revman 5. 2 provided by Cochrane library. A file of Microsoft Excel was established to perform the analysis of acupoints selection based on frequency analysis method, so as to summarize the characteristics and rules. Totally 385 articles were searched, and 37 articles those met the inclusive criteria was included, involving 1,260 patients in the acupuncture group and 1 238 patients in the control group. The Meta-analysis results showed OR = 3.82, 95% Cl [3.01, 4.85]; rhombus was located on the right side and the funnel plot was nearly symmetry, indicating the treatment effect of the acupuncture group for aphasia was superior to the control group (Z = 11.04, P < 0.000 01). The frequency-analysis results showed that the frequency of acupoints from top to bottom was Lian-quan (CV 23), Tongli (HT 5), Yamen (GV 15), Jinjin (EX-HN 12), Yuye (EX-HN 13), Baihui (GV 20), Yuyan II, Yuyan I and Yuyan III. The frequency of meridians from top to bottom was the governor vessel, extra channels, conception vessel, heart meridian and large intestine meridian. It is concluded that the clinical efficacy of acupuncture combined with speech rehabilitation training and medication treatment for aphasia is superior to that of speech rehabilitation training and medication treatment alone. The clinical treatment for aphasia focuses on its local effect; the main acupoints are in the head and face, and the meridians are governor vessel, extra channels and conception vessel.
NASA Astrophysics Data System (ADS)
The present conference on the development status of communications systems in the context of electronic warfare gives attention to topics in spread spectrum code acquisition, digital speech technology, fiber-optics communications, free space optical communications, the networking of HF systems, and applications and evaluation methods for digital speech. Also treated are issues in local area network system design, coding techniques and applications, technology applications for HF systems, receiver technologies, software development status, channel simultion/prediction methods, C3 networking spread spectrum networks, the improvement of communication efficiency and reliability through technical control methods, mobile radio systems, and adaptive antenna arrays. Finally, communications system cost analyses, spread spectrum performance, voice and image coding, switched networks, and microwave GaAs ICs, are considered.
Applying Spatial Audio to Human Interfaces: 25 Years of NASA Experience
NASA Technical Reports Server (NTRS)
Begault, Durand R.; Wenzel, Elizabeth M.; Godfrey, Martine; Miller, Joel D.; Anderson, Mark R.
2010-01-01
From the perspective of human factors engineering, the inclusion of spatial audio within a human-machine interface is advantageous from several perspectives. Demonstrated benefits include the ability to monitor multiple streams of speech and non-speech warning tones using a cocktail party advantage, and for aurally-guided visual search. Other potential benefits include the spatial coordination and interaction of multimodal events, and evaluation of new communication technologies and alerting systems using virtual simulation. Many of these technologies were developed at NASA Ames Research Center, beginning in 1985. This paper reviews examples and describes the advantages of spatial sound in NASA-related technologies, including space operations, aeronautics, and search and rescue. The work has involved hardware and software development as well as basic and applied research.
The evolution of speech: a comparative review.
Fitch
2000-07-01
The evolution of speech can be studied independently of the evolution of language, with the advantage that most aspects of speech acoustics, physiology and neural control are shared with animals, and thus open to empirical investigation. At least two changes were necessary prerequisites for modern human speech abilities: (1) modification of vocal tract morphology, and (2) development of vocal imitative ability. Despite an extensive literature, attempts to pinpoint the timing of these changes using fossil data have proven inconclusive. However, recent comparative data from nonhuman primates have shed light on the ancestral use of formants (a crucial cue in human speech) to identify individuals and gauge body size. Second, comparative analysis of the diverse vertebrates that have evolved vocal imitation (humans, cetaceans, seals and birds) provides several distinct, testable hypotheses about the adaptive function of vocal mimicry. These developments suggest that, for understanding the evolution of speech, comparative analysis of living species provides a viable alternative to fossil data. However, the neural basis for vocal mimicry and for mimesis in general remains unknown.
Lee, Yune-Sang; Turkeltaub, Peter; Granger, Richard; Raizada, Rajeev D S
2012-03-14
Although much effort has been directed toward understanding the neural basis of speech processing, the neural processes involved in the categorical perception of speech have been relatively less studied, and many questions remain open. In this functional magnetic resonance imaging (fMRI) study, we probed the cortical regions mediating categorical speech perception using an advanced brain-mapping technique, whole-brain multivariate pattern-based analysis (MVPA). Normal healthy human subjects (native English speakers) were scanned while they listened to 10 consonant-vowel syllables along the /ba/-/da/ continuum. Outside of the scanner, individuals' own category boundaries were measured to divide the fMRI data into /ba/ and /da/ conditions per subject. The whole-brain MVPA revealed that Broca's area and the left pre-supplementary motor area evoked distinct neural activity patterns between the two perceptual categories (/ba/ vs /da/). Broca's area was also found when the same analysis was applied to another dataset (Raizada and Poldrack, 2007), which previously yielded the supramarginal gyrus using a univariate adaptation-fMRI paradigm. The consistent MVPA findings from two independent datasets strongly indicate that Broca's area participates in categorical speech perception, with a possible role of translating speech signals into articulatory codes. The difference in results between univariate and multivariate pattern-based analyses of the same data suggest that processes in different cortical areas along the dorsal speech perception stream are distributed on different spatial scales.
ERIC Educational Resources Information Center
Frankel, Lois; Brownstein, Beth; Soiffer, Neil; Hansen, Eric
2016-01-01
The work described in this report is the first phase of a project to provide easy-to-use tools for authoring and rendering secondary-school algebra-level math expressions in synthesized speech that is useful for students with blindness or low vision. This report describes the initial development, software implementation, and evaluation of the…
ERIC Educational Resources Information Center
Frankel, Lois; Brownstein, Beth; Soiffer, Neil; Hansen, Eric
2016-01-01
The work described in this report is the first phase of a project to provide easy-to-use tools for authoring and rendering secondary-school algebra-level math expressions in synthesized speech that is useful for students with blindness or low vision. This report describes the initial development, software implementation, and evaluation of the…
Auditory-Motor Processing of Speech Sounds
Möttönen, Riikka; Dutton, Rebekah; Watkins, Kate E.
2013-01-01
The motor regions that control movements of the articulators activate during listening to speech and contribute to performance in demanding speech recognition and discrimination tasks. Whether the articulatory motor cortex modulates auditory processing of speech sounds is unknown. Here, we aimed to determine whether the articulatory motor cortex affects the auditory mechanisms underlying discrimination of speech sounds in the absence of demanding speech tasks. Using electroencephalography, we recorded responses to changes in sound sequences, while participants watched a silent video. We also disrupted the lip or the hand representation in left motor cortex using transcranial magnetic stimulation. Disruption of the lip representation suppressed responses to changes in speech sounds, but not piano tones. In contrast, disruption of the hand representation had no effect on responses to changes in speech sounds. These findings show that disruptions within, but not outside, the articulatory motor cortex impair automatic auditory discrimination of speech sounds. The findings provide evidence for the importance of auditory-motor processes in efficient neural analysis of speech sounds. PMID:22581846
STI: An objective measure for the performance of voice communication systems
NASA Astrophysics Data System (ADS)
Houtgast, T.; Steeneken, H. J. M.
1981-06-01
A measuring device was developed for determining the quality of speech communication systems. It comprises two parts, a signal source which replaces the talker, producing an artificial speech-like signal, and an analysis part which replaces the listener, by which the signal at the receiving end of the system under test is evaluated. Each single measurement results in an index (ranging from 0-100%) which indicates the effect of that communication system on speech intelligibility. The index is called STI (Speech Transmission Index). A careful design of the characteristics of the test signal and of the type of signal analysis makes the present approach widely applicable. It was verified experimentally that a given STI implies a given effect on speech intelligibility, irrespective of the nature of the actual disturbance (noise interference, band-pass limiting, peak clipping, etc.).
ERIC Educational Resources Information Center
Al-Majali, Wala'
2015-01-01
This study is designed to explore the salient linguistic features of the political speeches of the ousted Arab presidents during the Arab Spring Revolution. The sample of the study is composed of seven political speeches delivered by the ousted Arab presidents during the period from December 2010 to December 2012. Three speeches were delivered by…
Wang, Kun-Ching
2015-01-01
The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech. PMID:25594590
Strand, Edythe A; McCauley, Rebecca J; Weigand, Stephen D; Stoeckel, Ruth E; Baas, Becky S
2013-04-01
In this article, the authors report reliability and validity evidence for the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new test that uses dynamic assessment to aid in the differential diagnosis of childhood apraxia of speech (CAS). Participants were 81 children between 36 and 79 months of age who were referred to the Mayo Clinic for diagnosis of speech sound disorders. Children were given the DEMSS and a standard speech and language test battery as part of routine evaluations. Subsequently, intrajudge, interjudge, and test-retest reliability were evaluated for a subset of participants. Construct validity was explored for all 81 participants through the use of agglomerative cluster analysis, sensitivity measures, and likelihood ratios. The mean percentage of agreement for 171 judgments was 89% for test-retest reliability, 89% for intrajudge reliability, and 91% for interjudge reliability. Agglomerative hierarchical cluster analysis showed that total DEMSS scores largely differentiated clusters of children with CAS vs. mild CAS vs. other speech disorders. Positive and negative likelihood ratios and measures of sensitivity and specificity suggested that the DEMSS does not overdiagnose CAS but sometimes fails to identify children with CAS. The value of the DEMSS in differential diagnosis of severe speech impairments was supported on the basis of evidence of reliability and validity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bouchard, Kristofer E.; Conant, David F.; Anumanchipalli, Gopala K.
A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial-especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship acrossmore » speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.« less
Anumanchipalli, Gopala K.; Dichter, Benjamin; Chaisanguanthum, Kris S.; Johnson, Keith; Chang, Edward F.
2016-01-01
A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial—especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship across speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics. PMID:27019106
Bouchard, Kristofer E.; Conant, David F.; Anumanchipalli, Gopala K.; ...
2016-03-28
A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial-especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship acrossmore » speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.« less
Lamprecht-Dinnesen, A; Sick, U; Sandrieser, P; Illg, A; Lesinski-Schiedat, A; Döring, W H; Müller-Deile, J; Kiefer, J; Matthias, K; Wüst, A; Konradi, E; Riebandt, M; Matulat, P; Von Der Haar-Heise, S; Swart, J; Elixmann, K; Neumann, K; Hildmann, A; Coninx, F; Meyer, V; Gross, M; Kruse, E; Lenarz, T
2002-10-01
Since autumn 1998 the multicenter interdisciplinary study group "Test Materials for CI Children" has been compiling a uniform examination tool for evaluation of speech and hearing development after cochlear implantation in childhood. After studying the relevant literature, suitable materials were checked for practical applicability, modified and provided with criteria for execution and break-off. For data acquisition, observation forms for preparation of a PC-version were developed. The evaluation set contains forms for master data with supplements relating to postoperative processes. The hearing tests check supra-threshold hearing with loudness scaling for children, speech comprehension in silence (Mainz and Göttingen Test for Speech Comprehension in Childhood) and phonemic differentiation (Oldenburg Rhyme Test for Children), the central auditory processes of detection, discrimination, identification and recognition (modification of the "Frankfurt Functional Hearing Test for Children") and audiovisual speech perception (Open Paragraph Tracking, Kiel Speech Track Program). The materials for speech and language development comprise phonetics-phonology, lexicon and semantics (LOGO Pronunciation Test), syntax and morphology (analysis of spontaneous speech), language comprehension (Reynell Scales), communication and pragmatics (observation forms). The MAIS and MUSS modified questionnaires are integrated. The evaluation set serves quality assurance and permits factor analysis as well as controls for regularity through the multicenter comparison of long-term developmental trends after cochlear implantation.
Lim, Hayoung A; Draper, Ellary
2011-01-01
This study compared a common form of Applied Behavior Analysis Verbal Behavior (ABA VB) approach and music incorporated with ABA VB method as part of developmental speech-language training in the speech production of children with Autism Spectrum Disorders (ASD). This study explored how the perception of musical patterns incorporated in ABA VB operants impacted the production of speech in children with ASD. Participants were 22 children with ASD, age range 3 to 5 years, who were verbal or pre verbal with presence of immediate echolalia. They were randomly assigned a set of target words for each of the 3 training conditions: (a) music incorporated ABA VB, (b) speech (ABA VB) and (c) no-training. Results showed both music and speech trainings were effective for production of the four ABA verbal operants; however, the difference between music and speech training was not statistically different. Results also indicated that music incorporated ABA VB training was most effective in echoic production, and speech training was most effective in tact production. Music can be incorporated into the ABA VB training method, and musical stimuli can be used as successfully as ABA VB speech training to enhance the functional verbal production in children with ASD.
Noh, Heil; Lee, Dong-Hee
2012-01-01
To identify the quantitative differences between Korean and English in long-term average speech spectra (LTASS). Twenty Korean speakers, who lived in the capital of Korea and spoke standard Korean as their first language, were compared with 20 native English speakers. For the Korean speakers, a passage from a novel and a passage from a leading newspaper article were chosen. For the English speakers, the Rainbow Passage was used. The speech was digitally recorded using GenRad 1982 Precision Sound Level Meter and GoldWave® software and analyzed using MATLAB program. There was no significant difference in the LTASS between the Korean subjects reading a news article or a novel. For male subjects, the LTASS of Korean speakers was significantly lower than that of English speakers above 1.6 kHz except at 4 kHz and its difference was more than 5 dB, especially at higher frequencies. For women, the LTASS of Korean speakers showed significantly lower levels at 0.2, 0.5, 1, 1.25, 2, 2.5, 6.3, 8, and 10 kHz, but the differences were less than 5 dB. Compared with English speakers, the LTASS of Korean speakers showed significantly lower levels in frequencies above 2 kHz except at 4 kHz. The difference was less than 5 dB between 2 and 5 kHz but more than 5 dB above 6 kHz. To adjust the formula for fitting hearing aids for Koreans, our results based on the LTASS analysis suggest that one needs to raise the gain in high-frequency regions.
Sapir, S; Canter, G J
1991-09-01
Using acoustic analysis techniques, Waldstein [J. Acoust. Soc. Am. 88, 2099-2114 (1990] reported abnormal speech findings in postlingual deaf speakers. She interpreted her findings to suggest that auditory feedback is important in motor speech control. However, it is argued here that Waldstein's interpretation may be unwarranted without addressing the possibility of neurologic deficits (e.g., dysarthria) as confounding (or even primary) causes of the abnormal speech in her subjects.
Research on Speech Perception. Progress Report No. 13.
ERIC Educational Resources Information Center
Pisoni, David B.; And Others
Summarizing research activities in 1987, this is the thirteenth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, progress reports, and information on…
Sowden, Hannah; Clegg, Judy; Perkins, Michael
2013-12-01
Co-speech gestures have a close semantic relationship to speech in adult conversation. In typically developing children co-speech gestures which give additional information to speech facilitate the emergence of multi-word speech. A difficulty with integrating audio-visual information is known to exist for individuals with Autism Spectrum Disorder (ASD), which may affect development of the speech-gesture system. A longitudinal observational study was conducted with four children with ASD, aged 2;4 to 3;5 years. Participants were video-recorded for 20 min every 2 weeks during their attendance on an intervention programme. Recording continued for up to 8 months, thus affording a rich analysis of gestural practices from pre-verbal to multi-word speech across the group. All participants combined gesture with either speech or vocalisations. Co-speech gestures providing additional information to speech were observed to be either absent or rare. Findings suggest that children with ASD do not make use of the facilitating communicative effects of gesture in the same way as typically developing children.
Corrêa, Ana Grasielle Dionísio; de Assis, Gilda Aparecida; do Nascimento, Marilena; de Deus Lopes, Roseli
2017-04-01
Augmented Reality musical software (GenVirtual) is a technology, which primarily allows users to develop music activities for rehabilitation. This study aimed to analyse the perceptions of health care professionals regarding the clinical utility of GenVirtual. A second objective was to identify improvements to GenVirtual software and similar technologies. Music therapists, occupational therapists, physiotherapists and speech and language therapist who assist people with physical and cognitive disabilities were enrolled in three focus groups. The quantitative and qualitative data were collected through inductive thematic analysis. Three main themes were identified: the use of GenVirtual in health care areas; opportunities for realistic application of GenVirtual; and limitations in the use of GenVirtual. The registration units identified were: motor stimulation, cognitive stimulation, verbal learning, recreation activity, musicality, accessibility, motivation, sonic accuracy, interference of lighting, poor sound, children and adults. This research suggested that the GenVirtual is a complementary tool to conventional clinical practice and has great potential to motor and cognitive rehabilitation of children and adults. Implications for Rehabilitation Gaining health professional' perceptions of the Augmented Reality musical game (GenVirtual) give valuable information as to the clinical utility of the software. GenVirtual was perceived as a tool that could be used as enhancing the motor and cognitive rehabilitation process. GenVirtual was viewed as a tool that could enhance clinical practice and communication among various agencies, but it was suggested that it should be used with caution to avoid confusion and replacement of important services.
Methods for eliciting, annotating, and analyzing databases for child speech development.
Beckman, Mary E; Plummer, Andrew R; Munson, Benjamin; Reidy, Patrick F
2017-09-01
Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver-infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of age-appropriate computational models.
Geetha, Chinnaraj; Tanniru, Kishore; Rajan, R Raja
2017-04-01
This study aimed to evaluate the use of directionality in hearing aids with wireless synchronization on localization and speech intelligibility in noise. This study included 25 individuals with bilateral mild to moderate flat sensorineural hearing loss. For the localization experiment, eight loudspeakers (Genelec 8020B) arranged in a circle covering a 0-360° angle and the Cubase 6 software were used for presenting the stimulus. A car horn of 260 ms was presented from these loudspeakers, one at a time, randomly. The listener was instructed to point to the direction of the source. The degree of the localization error was obtained with and without directionality and wireless synchronization options. For speech perception in a noise experiment, signal to noise ratio-50 (SNR-50) was obtained using sentences played through a speaker at a fixed angle of 0°. A calibrated eight-talker speech babble was used as noise and the babble was routed either through 0°, 90°, 270° (through one speaker at a time) or through both 90° and 270° speakers. The results revealed that the conditions where both the wireless synchronization and directionality were activated resulted in a significantly better performance in both localization and speech perception in noise tasks. It can be concluded that the directionality in the wireless synchronization hearing aids coordinates with each other binaurally for better preservation of binaural cues, thus reducing the localization errors and improving speech perception in noise. The results of this study could be used to counsel and justify the selection of the directional wireless synchronization hearing aids.
Speech identification in noise: Contribution of temporal, spectral, and visual speech cues.
Kim, Jeesun; Davis, Chris; Groot, Christopher
2009-12-01
This study investigated the degree to which two types of reduced auditory signals (cochlear implant simulations) and visual speech cues combined for speech identification. The auditory speech stimuli were filtered to have only amplitude envelope cues or both amplitude envelope and spectral cues and were presented with/without visual speech. In Experiment 1, IEEE sentences were presented in quiet and noise. For in-quiet presentation, speech identification was enhanced by the addition of both spectral and visual speech cues. Due to a ceiling effect, the degree to which these effects combined could not be determined. In noise, these facilitation effects were more marked and were additive. Experiment 2 examined consonant and vowel identification in the context of CVC or VCV syllables presented in noise. For consonants, both spectral and visual speech cues facilitated identification and these effects were additive. For vowels, the effect of combined cues was underadditive, with the effect of spectral cues reduced when presented with visual speech cues. Analysis indicated that without visual speech, spectral cues facilitated the transmission of place information and vowel height, whereas with visual speech, they facilitated lip rounding, with little impact on the transmission of place information.
Hyperarticulation in Lombard speech: Global coordination of the jaw, lips and the tongue.
Šimko, Juraj; Beňuš, Štefan; Vainio, Martti
2016-01-01
Over the last century, researchers have collected a considerable amount of data reflecting the properties of Lombard speech, i.e., speech in a noisy environment. The documented phenomena predominately report effects on the speech signal produced in ambient noise. In comparison, relatively little is known about the underlying articulatory patterns of Lombard speech, in particular for lingual articulation. Here the authors present an analysis of articulatory recordings of speech material in babble noise of different intensity levels and in hypoarticulated speech and report quantitative differences in relative expansion of movement of different articulatory subsystems (the jaw, the lips and the tongue) as well as in relative expansion of utterance duration. The trajectory modifications for one articulator can be relatively reliably predicted by those for another one, but subsystems differ in a degree of continuity in trajectory expansion elicited across different noise levels. Regression analysis of articulatory modifications against durational expansion shows further qualitative differences between the subsystems, namely, the jaw and the tongue. The findings are discussed in terms of possible influences of a combination of prosodic, segmental, and physiological factors. In addition, the Lombard effect is put forward as a viable methodology for eliciting global articulatory variation in a controlled manner.
Predicting Intelligibility Gains in Dysarthria through Automated Speech Feature Analysis
ERIC Educational Resources Information Center
Fletcher, Annalise R.; Wisler, Alan A.; McAuliffe, Megan J.; Lansford, Kaitlin L.; Liss, Julie M.
2017-01-01
Purpose: Behavioral speech modifications have variable effects on the intelligibility of speakers with dysarthria. In the companion article, a significant relationship was found between measures of speakers' baseline speech and their intelligibility gains following cues to speak louder and reduce rate (Fletcher, McAuliffe, Lansford, Sinex, &…
Statistical Methods of Latent Structure Discovery in Child-Directed Speech
ERIC Educational Resources Information Center
Panteleyeva, Natalya B.
2010-01-01
This dissertation investigates how distributional information in the speech stream can assist infants in the initial stages of acquisition of their native language phonology. An exploratory statistical analysis derives this information from the adult speech data in the corpus of conversations between adults and young children in Russian. Because…
Rhetorical Analysis as Introductory Speech: Jumpstarting Student Engagement
ERIC Educational Resources Information Center
Malone, Marc P.
2012-01-01
When students enter the basic public speaking classroom,When students enter the basic public speaking classroom, they are asked to develop an introductory speech. This assignment typically focuses on a speech of self-introduction for which there are several pedagogical underpinnings: it provides an immediate and relatively stress-free speaking…
Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging
ERIC Educational Resources Information Center
Hagedorn, Christina; Proctor, Michael; Goldstein, Louis; Wilson, Stephen M.; Miller, Bruce; Gorno-Tempini, Maria Luisa; Narayanan, Shrikanth S.
2017-01-01
Purpose: Real-time magnetic resonance imaging (MRI) and accompanying analytical methods are shown to capture and quantify salient aspects of apraxic speech, substantiating and expanding upon evidence provided by clinical observation and acoustic and kinematic data. Analysis of apraxic speech errors within a dynamic systems framework is provided…
Speech Training for Inmate Rehabilitation.
ERIC Educational Resources Information Center
Parkinson, Michael G.; Dobkins, David H.
1982-01-01
Using a computerized content analysis, the authors demonstrate changes in speech behaviors of prison inmates. They conclude that two to four hours of public speaking training can have only limited effect on students who live in a culture in which "prison speech" is the expected and rewarded form of behavior. (PD)
Department of Cybernetic Acoustics
NASA Astrophysics Data System (ADS)
The development of the theory, instrumentation and applications of methods and systems for the measurement, analysis, processing and synthesis of acoustic signals within the audio frequency range, particularly of the speech signal and the vibro-acoustic signal emitted by technical and industrial equipments treated as noise and vibration sources was discussed. The research work, both theoretical and experimental, aims at applications in various branches of science, and medicine, such as: acoustical diagnostics and phoniatric rehabilitation of pathological and postoperative states of the speech organ; bilateral ""man-machine'' speech communication based on the analysis, recognition and synthesis of the speech signal; vibro-acoustical diagnostics and continuous monitoring of the state of machines, technical equipments and technological processes.
Zhang, Y; Li, D D; Chen, X W
2017-06-20
Objective: Case-control study analysis of the speech discrimination of unilateral microtia and external auditory canal atresia patients with normal hearing subjects in quiet and noisy environment. To understand the speech recognition results of patients with unilateral external auditory canal atresia and provide scientific basis for clinical early intervention. Method: Twenty patients with unilateral congenital microtia malformation combined external auditory canal atresia, 20 age matched normal subjects as control group. All subjects used Mandarin speech audiometry material, to test the speech discrimination scores (SDS) in quiet and noisy environment in sound field. Result: There's no significant difference of speech discrimination scores under the condition of quiet between two groups. There's a statistically significant difference when the speech signal in the affected side and noise in the nomalside (single syllable, double syllable, statements; S/N=0 and S/N=-10) ( P <0.05). There's no significant difference of speech discrimination scores when the speech signal in the nomalside and noise in the affected side. There's a statistically significant difference in condition of the signal and noise in the same side when used one-syllable word recognition (S/N=0 and S/N=-5) ( P <0.05), while double syllable word and statement has no statistically significant difference ( P >0.05). Conclusion: The speech discrimination scores of unilateral congenital microtia malformation patients with external auditory canal atresia under the condition of noise is lower than the normal subjects. Copyright© by the Editorial Department of Journal of Clinical Otorhinolaryngology Head and Neck Surgery.
Speech and Language Consequences of Unilateral Hearing Loss: A Systematic Review.
Anne, Samantha; Lieu, Judith E C; Cohen, Michael S
2017-10-01
Objective Unilateral hearing loss has been shown to have negative consequences for speech and language development in children. The objective of this study was to systematically review the current literature to quantify the impact of unilateral hearing loss on children, with the use of objective measures of speech and language. Data Sources PubMed, EMBASE, Medline, CINAHL, and Cochrane Library were searched from inception to March 2015. Manual searches of references were also completed. Review Methods All studies that described speech and language outcomes for children with unilateral hearing loss were included. Outcome measures included results from any test of speech and language that evaluated or had age-standardized norms. Due to heterogeneity of the data, quantitative analysis could not be completed. Qualitative analysis was performed on the included studies. Two independent evaluators reviewed each abstract and article. Results A total of 429 studies were identified; 13 met inclusion criteria and were reviewed. Overall, 7 studies showed poorer scores on various speech and language tests, with effects more pronounced for children with severe to profound hearing loss. Four studies did not demonstrate any difference in testing results between patients with unilateral hearing loss and those with normal hearing. Two studies that evaluated effects on speech and language longitudinally showed initial speech problems, with improvement in scores over time. Conclusions There are inconsistent data regarding effects of unilateral hearing loss on speech and language outcomes for children. The majority of recent studies suggest poorer speech and language testing results, especially for patients with severe to profound unilateral hearing loss.
ERIC Educational Resources Information Center
Armstrong, Linda; Stansfield, Jois; Bloch, Steven
2017-01-01
Background: Following content analyses of the first 30 years of the UK speech and language therapy professional body's journal, this study was conducted to survey the published work of the speech (and language) therapy profession over the last 50 years and trace key changes and themes. Aim: To understand better the development of the UK speech and…
Use of Language Sample Analysis by School-Based SLPs: Results of a Nationwide Survey
ERIC Educational Resources Information Center
Pavelko, Stacey L.; Owens, Robert E., Jr.; Ireland, Marie; Hahs-Vaughn, Debbie L.
2016-01-01
Purpose: This article examines use of language sample analysis (LSA) by school-based speech-language pathologists (SLPs), including characteristics of language samples, methods of transcription and analysis, barriers to LSA use, and factors affecting LSA use, such as American Speech-Language-Hearing Association certification, number of years'…
Wie, Ona Bø; Falkenberg, Eva-Signe; Tvete, Ole; Tomblin, Bruce
2007-05-01
The objectives of the study were to describe the characteristics of the first 79 prelingually deaf cochlear implant users in Norway and to investigate to what degree the variation in speech recognition, speech- recognition growth rate, and speech production could be explained by the characteristics of the child, the cochlear implant, the family, and the educational setting. Data gathered longitudinally were analysed using descriptive statistics, multiple regression, and growth-curve analysis. The results show that more than 50% of the variation could be explained by these characteristics. Daily user-time, non-verbal intelligence, mode of communication, length of CI experience, and educational placement had the highest effect on the outcome. The results also indicate that children educated in a bilingual approach to education have better speech perception and faster speech perception growth rate with increased focus on spoken language.
Speech in spinocerebellar ataxia.
Schalling, Ellika; Hartelius, Lena
2013-12-01
Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.
Effects of low harmonics on tone identification in natural and vocoded speech.
Liu, Chang; Azimi, Behnam; Tahmina, Qudsia; Hu, Yi
2012-11-01
This study investigated the contribution of low-frequency harmonics to identifying Mandarin tones in natural and vocoded speech in quiet and noisy conditions. Results showed that low-frequency harmonics of natural speech led to highly accurate tone identification; however, for vocoded speech, low-frequency harmonics yielded lower tone identification than stimuli with full harmonics, except for tone 4. Analysis of the correlation between tone accuracy and the amplitude-F0 correlation index suggested that "more" speech contents (i.e., more harmonics) did not necessarily yield better tone recognition for vocoded speech, especially when the amplitude contour of the signals did not co-vary with the F0 contour.
Cortical Integration of Audio-Visual Information
Vander Wyk, Brent C.; Ramsay, Gordon J.; Hudac, Caitlin M.; Jones, Warren; Lin, David; Klin, Ami; Lee, Su Mei; Pelphrey, Kevin A.
2013-01-01
We investigated the neural basis of audio-visual processing in speech and non-speech stimuli. Physically identical auditory stimuli (speech and sinusoidal tones) and visual stimuli (animated circles and ellipses) were used in this fMRI experiment. Relative to unimodal stimuli, each of the multimodal conjunctions showed increased activation in largely non-overlapping areas. The conjunction of Ellipse and Speech, which most resembles naturalistic audiovisual speech, showed higher activation in the right inferior frontal gyrus, fusiform gyri, left posterior superior temporal sulcus, and lateral occipital cortex. The conjunction of Circle and Tone, an arbitrary audio-visual pairing with no speech association, activated middle temporal gyri and lateral occipital cortex. The conjunction of Circle and Speech showed activation in lateral occipital cortex, and the conjunction of Ellipse and Tone did not show increased activation relative to unimodal stimuli. Further analysis revealed that middle temporal regions, although identified as multimodal only in the Circle-Tone condition, were more strongly active to Ellipse-Speech or Circle-Speech, but regions that were identified as multimodal for Ellipse-Speech were always strongest for Ellipse-Speech. Our results suggest that combinations of auditory and visual stimuli may together be processed by different cortical networks, depending on the extent to which speech or non-speech percepts are evoked. PMID:20709442
Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech
NASA Astrophysics Data System (ADS)
Přibil, J.; Přibilová, A.
2009-01-01
The paper addresses reflection of microintonation and spectral properties in male and female acted emotional speech. Microintonation component of speech melody is analyzed regarding its spectral and statistical parameters. According to psychological research of emotional speech, different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of cepstral coefficient values and ranges of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Histograms of cepstral coefficient distribution are evaluated and compared using skewness and kurtosis. Achieved statistical results show good correlation comparing male and female voices for all emotional states portrayed by several Czech and Slovak professional actors.
Nawaz, Tabassam; Mehmood, Zahid; Rashid, Muhammad; Habib, Hafiz Adnan
2018-01-01
Recent research on speech segregation and music fingerprinting has led to improvements in speech segregation and music identification algorithms. Speech and music segregation generally involves the identification of music followed by speech segregation. However, music segregation becomes a challenging task in the presence of noise. This paper proposes a novel method of speech segregation for unlabelled stationary noisy audio signals using the deep belief network (DBN) model. The proposed method successfully segregates a music signal from noisy audio streams. A recurrent neural network (RNN)-based hidden layer segregation model is applied to remove stationary noise. Dictionary-based fisher algorithms are employed for speech classification. The proposed method is tested on three datasets (TIMIT, MIR-1K, and MusicBrainz), and the results indicate the robustness of proposed method for speech segregation. The qualitative and quantitative analysis carried out on three datasets demonstrate the efficiency of the proposed method compared to the state-of-the-art speech segregation and classification-based methods. PMID:29558485
Comprehension of Co-Speech Gestures in Aphasic Patients: An Eye Movement Study.
Eggenberger, Noëmi; Preisig, Basil C; Schumacher, Rahel; Hopfner, Simone; Vanbellingen, Tim; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Cazzoli, Dario; Müri, René M
2016-01-01
Co-speech gestures are omnipresent and a crucial element of human interaction by facilitating language comprehension. However, it is unclear whether gestures also support language comprehension in aphasic patients. Using visual exploration behavior analysis, the present study aimed to investigate the influence of congruence between speech and co-speech gestures on comprehension in terms of accuracy in a decision task. Twenty aphasic patients and 30 healthy controls watched videos in which speech was either combined with meaningless (baseline condition), congruent, or incongruent gestures. Comprehension was assessed with a decision task, while remote eye-tracking allowed analysis of visual exploration. In aphasic patients, the incongruent condition resulted in a significant decrease of accuracy, while the congruent condition led to a significant increase in accuracy compared to baseline accuracy. In the control group, the incongruent condition resulted in a decrease in accuracy, while the congruent condition did not significantly increase the accuracy. Visual exploration analysis showed that patients fixated significantly less on the face and tended to fixate more on the gesturing hands compared to controls. Co-speech gestures play an important role for aphasic patients as they modulate comprehension. Incongruent gestures evoke significant interference and deteriorate patients' comprehension. In contrast, congruent gestures enhance comprehension in aphasic patients, which might be valuable for clinical and therapeutic purposes.
Flippin, Michelle; Reszka, Stephanie; Watson, Linda R
2010-05-01
The Picture Exchange Communication System (PECS) is a popular communication-training program for young children with autism spectrum disorders (ASD). This meta-analysis reviews the current empirical evidence for PECS in affecting communication and speech outcomes for children with ASD. A systematic review of the literature on PECS written between 1994 and June 2009 was conducted. Quality of scientific rigor was assessed and used as an inclusion criterion in computation of effect sizes. Effect sizes were aggregated separately for single-subject and group studies for communication and speech outcomes. Eight single-subject experiments (18 participants) and 3 group studies (95 PECS participants, 65 in other intervention/control) were included. Results indicated that PECS is a promising but not yet established evidence-based intervention for facilitating communication in children with ASD ages 1-11 years. Small to moderate gains in communication were demonstrated following training. Gains in speech were small to negative. This meta-analysis synthesizes gains in communication and relative lack of gains made in speech across the PECS literature for children with ASD. Concerns about maintenance and generalization are identified. Emerging evidence of potential preintervention child characteristics is discussed. Phase IV was identified as a possibly influential program characteristic for speech outcomes.
Stuttering as a trait or state - an ALE meta-analysis of neuroimaging studies.
Belyk, Michel; Kraft, Shelly Jo; Brown, Steven
2015-01-01
Stuttering is a speech disorder characterised by repetitions, prolongations and blocks that disrupt the forward movement of speech. An earlier meta-analysis of brain imaging studies of stuttering (Brown et al., 2005) revealed a general trend towards rightward lateralization of brain activations and hyperactivity in the larynx motor cortex bilaterally. The present study sought not only to update that meta-analysis with recent work but to introduce an important distinction not present in the first study, namely the difference between 'trait' and 'state' stuttering. The analysis of trait stuttering compares people who stutter (PWS) with people who do not stutter when behaviour is controlled for, i.e., when speech is fluent in both groups. In contrast, the analysis of state stuttering examines PWS during episodes of stuttered speech compared with episodes of fluent speech. Seventeen studies were analysed using activation likelihood estimation. Trait stuttering was characterised by the well-known rightward shift in lateralization for language and speech areas. State stuttering revealed a more diverse pattern. Abnormal activation of larynx and lip motor cortex was common to the two analyses. State stuttering was associated with overactivation in the right hemisphere larynx and lip motor cortex. Trait stuttering was associated with overactivation of lip motor cortex in the right hemisphere but underactivation of larynx motor cortex in the left hemisphere. These results support a large literature highlighting laryngeal and lip involvement in the symptomatology of stuttering, and disambiguate two possible sources of activation in neuroimaging studies of persistent developmental stuttering. © 2014 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Advancements in robust algorithm formulation for speaker identification of whispered speech
NASA Astrophysics Data System (ADS)
Fan, Xing
Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard/made public. Due to the profound differences between whispered and neutral speech in production mechanism and the absence of whispered adaptation data, the performance of speaker identification systems trained with neutral speech degrades significantly. This dissertation therefore focuses on developing a robust closed-set speaker recognition system for whispered speech by using no or limited whispered adaptation data from non-target speakers. This dissertation proposes the concept of "High''/"Low'' performance whispered data for the purpose of speaker identification. A variety of acoustic properties are identified that contribute to the quality of whispered data. An acoustic analysis is also conducted to compare the phoneme/speaker dependency of the differences between whispered and neutral data in the feature domain. The observations from those acoustic analysis are new in this area and also serve as a guidance for developing robust speaker identification systems for whispered speech. This dissertation further proposes two systems for speaker identification of whispered speech. One system focuses on front-end processing. A two-dimensional feature space is proposed to search for "Low''-quality performance based whispered utterances and separate feature mapping functions are applied to vowels and consonants respectively in order to retain the speaker's information shared between whispered and neutral speech. The other system focuses on speech-mode-independent model training. The proposed method generates pseudo whispered features from neutral features by using the statistical information contained in a whispered Universal Background model (UBM) trained from extra collected whispered data from non-target speakers. Four modeling methods are proposed for the transformation estimation in order to generate the pseudo whispered features. Both of the above two systems demonstrate a significant improvement over the baseline system on the evaluation data. This dissertation has therefore contributed to providing a scientific understanding of the differences between whispered and neutral speech as well as improved front-end processing and modeling method for speaker identification of whispered speech. Such advancements will ultimately contribute to improve the robustness of speech processing systems.
Multilingual Speech and Language Processing
2003-04-01
client software handles the user end of the transaction. Historically, four clients were provided: e-mail, web, FrameMaker , and command line. By...command-line client and an API. The API allows integration of CyberTrans into a number of processes including word processing packages ( FrameMaker ...preservation and logging, and others. The available clients remain e-mail, Web and FrameMaker . Platforms include both Unix and PC for clients, with
Software Requirements Engineering Methodology
1976-09-01
common speech, so that the specification can be read by managers, systems enginetrs, and others who are not specially trained in the language. To...of the system and its DPS. They are usually implicit in the wording of the originating specifications, although the new SREM user must train ...to the name of the ENTITYjCLASS, the operation is applicable only to a single instance. This concentration of the requirements for creation and
Jeon, Myounghoon; Walker, Bruce N; Gable, Thomas M
2015-09-01
Research has suggested that interaction with an in-vehicle software agent can improve a driver's psychological state and increase road safety. The present study explored the possibility of using an in-vehicle software agent to mitigate effects of driver anger on driving behavior. After either anger or neutral mood induction, 60 undergraduates drove in a simulator with two types of agent intervention. Results showed that both speech-based agents not only enhance driver situation awareness and driving performance, but also reduce their anger level and perceived workload. Regression models show that a driver's anger influences driving performance measures, mediated by situation awareness. The practical implications include design guidelines for the design of social interaction with in-vehicle software agents. Copyright © 2015 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Speech Analysis of Bengali Speaking Children with Repaired Cleft Lip & Palate
ERIC Educational Resources Information Center
Chakrabarty, Madhushree; Kumar, Suman; Chatterjee, Indranil; Maheshwari, Neha
2012-01-01
The present study aims at analyzing speech samples of four Bengali speaking children with repaired cleft palates with a view to differentiate between the misarticulations arising out of a deficit in linguistic skills and structural or motoric limitations. Spontaneous speech samples were collected and subjected to a number of linguistic analyses…
Comparing Measures of Voice Quality from Sustained Phonation and Continuous Speech
ERIC Educational Resources Information Center
Gerratt, Bruce R.; Kreiman, Jody; Garellek, Marc
2016-01-01
Purpose: The question of what type of utterance--a sustained vowel or continuous speech--is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation.…
ERIC Educational Resources Information Center
Kim, Yunjung; Kent, Raymond D.; Weismer, Gary
2011-01-01
Purpose: This study examined acoustic predictors of speech intelligibility in speakers with several types of dysarthria secondary to different diseases and conducted classification analysis solely by acoustic measures according to 3 variables (disease, speech severity, and dysarthria type). Method: Speech recordings from 107 speakers with…
Research on Speech Perception. Progress Report No. 8, January 1982-December 1982.
ERIC Educational Resources Information Center
Pisoni, David B.; And Others
Summarizing research activities from January 1982 to December 1982, this is the eighth annual report of research on speech perception, analysis and synthesis conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, progress reports, and information…
Replacing Maladaptive Speech with Verbal Labeling Responses: An Analysis of Generalized Responding.
ERIC Educational Resources Information Center
Foxx, R. M.; And Others
1988-01-01
Three mentally handicapped students (aged 13, 36, and 40) with maladaptive speech received training to answer questions with verbal labels. The results of their cues-pause-point training showed that the students replaced their maladaptive speech with correct labels (answers) to questions in the training setting and three generalization settings.…
Research on Speech Perception. Progress Report No. 9, January 1983-December 1983.
ERIC Educational Resources Information Center
Pisoni, David B.; And Others
Summarizing research activities from January 1983 to December 1983, this is the ninth annual report of research on speech perception, analysis and synthesis conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, and progress reports. The report…
ERIC Educational Resources Information Center
Howard, Sara
2004-01-01
A combination of perceptual and electropalatographic (EPG) analysis is used to investigate speech production in three adolescent speakers with a history of cleft palate. All the subjects still sound markedly atypical. Their speech output is analysed in three conditions: diadochokinetic tasks; single word production; connected speech. Comparison of…
Speech after Mao: Literature and Belonging
ERIC Educational Resources Information Center
Hsieh, Victoria Linda
2012-01-01
This dissertation aims to understand the apparent failure of speech in post-Mao literature to fulfill its conventional functions of representation and communication. In order to understand this pattern, I begin by looking back on the utility of speech for nation-building in modern China. In addition to literary analysis of key authors and works,…
Speech Development of Autistic Children by Interactive Computer Games
ERIC Educational Resources Information Center
Rahman, Mustafizur; Ferdous, S. M.; Ahmed, Syed Ishtiaque; Anwar, Anika
2011-01-01
Purpose: Speech disorder is one of the most common problems found with autistic children. The purpose of this paper is to investigate the introduction of computer-based interactive games along with the traditional therapies in order to help improve the speech of autistic children. Design/methodology/approach: From analysis of the works of Ivar…
Applications of Text Analysis Tools for Spoken Response Grading
ERIC Educational Resources Information Center
Crossley, Scott; McNamara, Danielle
2013-01-01
This study explores the potential for automated indices related to speech delivery, language use, and topic development to model human judgments of TOEFL speaking proficiency in second language (L2) speech samples. For this study, 244 transcribed TOEFL speech samples taken from 244 L2 learners were analyzed using automated indices taken from…
Speech and gesture in spatial language and cognition among the Yucatec Mayas.
Le Guen, Olivier
2011-07-01
In previous analyses of the influence of language on cognition, speech has been the main channel examined. In studies conducted among Yucatec Mayas, efforts to determine the preferred frame of reference in use in this community have failed to reach an agreement (Bohnemeyer & Stolz, 2006; Levinson, 2003 vs. Le Guen, 2006, 2009). This paper argues for a multimodal analysis of language that encompasses gesture as well as speech, and shows that the preferred frame of reference in Yucatec Maya is only detectable through the analysis of co-speech gesture and not through speech alone. A series of experiments compares knowledge of the semantics of spatial terms, performance on nonlinguistic tasks and gestures produced by men and women. The results show a striking gender difference in the knowledge of the semantics of spatial terms, but an equal preference for a geocentric frame of reference in nonverbal tasks. In a localization task, participants used a variety of strategies in their speech, but they all exhibited a systematic preference for a geocentric frame of reference in their gestures. Copyright © 2011 Cognitive Science Society, Inc.
Engaged listeners: shared neural processing of powerful political speeches
Häcker, Frank E. K.; Honey, Christopher J.; Hasson, Uri
2015-01-01
Powerful speeches can captivate audiences, whereas weaker speeches fail to engage their listeners. What is happening in the brains of a captivated audience? Here, we assess audience-wide functional brain dynamics during listening to speeches of varying rhetorical quality. The speeches were given by German politicians and evaluated as rhetorically powerful or weak. Listening to each of the speeches induced similar neural response time courses, as measured by inter-subject correlation analysis, in widespread brain regions involved in spoken language processing. Crucially, alignment of the time course across listeners was stronger for rhetorically powerful speeches, especially for bilateral regions of the superior temporal gyri and medial prefrontal cortex. Thus, during powerful speeches, listeners as a group are more coupled to each other, suggesting that powerful speeches are more potent in taking control of the listeners’ brain responses. Weaker speeches were processed more heterogeneously, although they still prompted substantially correlated responses. These patterns of coupled neural responses bear resemblance to metaphors of resonance, which are often invoked in discussions of speech impact, and contribute to the literature on auditory attention under natural circumstances. Overall, this approach opens up possibilities for research on the neural mechanisms mediating the reception of entertaining or persuasive messages. PMID:25653012
NASA Astrophysics Data System (ADS)
Nakagawa, Seiji; Fujiyuki, Chika; Kagomiya, Takayuki
2013-07-01
Bone-conducted ultrasound (BCU) is perceived even by the profoundly sensorineural deaf. A novel hearing aid using the perception of amplitude-modulated BCU (BCU hearing aid: BCUHA) has been developed. However, there is room for improvement particularly in terms of sound quality. BCU speech is accompanied by a strong high-pitched tone and contain some distortion. In this study, the sound quality of BCU speech with several types of amplitude modulation [double-sideband with transmitted carrier (DSB-TC), double-sideband with suppressed carrier (DSB-SC), and transposed modulations] and air-conducted (AC) speech was quantitatively evaluated using semantic differential and factor analysis. The results showed that all the types of BCU speech had higher metallic and lower esthetic factor scores than AC speech. On the other hand, transposed speech was closer than the other types of BCU speech to AC speech generally; the transposed speech showed a higher powerfulness factor score than the other types of BCU speech and a higher esthetic factor score than DSB-SC speech. These results provide useful information for further development of the BCUHA.
Speech Entrainment Compensates for Broca's Area Damage
Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris
2015-01-01
Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443
"The caterpillar": a novel reading passage for assessment of motor speech disorders.
Patel, Rupal; Connaghan, Kathryn; Franco, Diana; Edsall, Erika; Forgit, Dory; Olsen, Laura; Ramage, Lianna; Tyler, Emily; Russell, Scott
2013-02-01
A review of the salient characteristics of motor speech disorders and common assessment protocols revealed the need for a novel reading passage tailored specifically to differentiate between and among the dysarthrias (DYSs) and apraxia of speech (AOS). "The Caterpillar" passage was designed to provide a contemporary, easily read, contextual speech sample with specific tasks (e.g., prosodic contrasts, words of increasing length and complexity) targeted to inform the assessment of motor speech disorders. Twenty-two adults, 15 with DYS or AOS and 7 healthy controls (HC), were recorded reading "The Caterpillar" passage to demonstrate its utility in examining motor speech performance. Analysis of performance across a subset of segmental and prosodic variables illustrated that "The Caterpillar" passage showed promise for extracting individual profiles of impairment that could augment current assessment protocols and inform treatment planning in motor speech disorders.
Fluid-acoustic interactions and their impact on pathological voiced speech
NASA Astrophysics Data System (ADS)
Erath, Byron D.; Zanartu, Matias; Peterson, Sean D.; Plesniak, Michael W.
2011-11-01
Voiced speech is produced by vibration of the vocal fold structures. Vocal fold dynamics arise from aerodynamic pressure loadings, tissue properties, and acoustic modulation of the driving pressures. Recent speech science advancements have produced a physiologically-realistic fluid flow solver (BLEAP) capable of prescribing asymmetric intraglottal flow attachment that can be easily assimilated into reduced order models of speech. The BLEAP flow solver is extended to incorporate acoustic loading and sound propagation in the vocal tract by implementing a wave reflection analog approach for sound propagation based on the governing BLEAP equations. This enhanced physiological description of the physics of voiced speech is implemented into a two-mass model of speech. The impact of fluid-acoustic interactions on vocal fold dynamics is elucidated for both normal and pathological speech through linear and nonlinear analysis techniques. Supported by NSF Grant CBET-1036280.
ERIC Educational Resources Information Center
Leroy-Malherbe, V.; Chevrie-Muller, C.; Rigoard, M. T.; Arabia, C.
1998-01-01
This case report describes the case of a 52-year-old man with bilateral central lingual paralysis following a myocardial infarction. Analysis of speech recordings 15 days and 18 months after the attack were acoustically analyzed. The case demonstrates the usefulness of acoustic analysis to detect slight acoustic differences. (DB)
ERIC Educational Resources Information Center
Esch, Barbara E.; Forbes, Heather J.
2017-01-01
The open-source "Journal of Speech and Language Pathology-Applied Behavior Analysis" ("JSLP-ABA") was published online from 2006 to 2010. We present an annotated bibliography of 80 articles published in the now-defunct journal with the aim of representing its scholarly content to readers of "The Analysis of Verbal…
Research trends in the field of speech and hearing.
Kricos, P B; Ptacek, P H; Hyman, M; Black, J W
1979-04-01
Topics of research in the field of speech and hearing were identified and compared over a 21-yr period (1954--1974). These topics were identidied by a key-word analysis of approximately 8200 titles consisting of articles in national and international journals, and of theses and dissertations presented in the state of Ohio. Results of this analysis have pinpointed certain research trends in the field of speech and hearing. Attention to certain topics has either declined, increased, or reached a peak during the 21-yr period, while interest in some topics has been consistently maintained throughout the years. The information reported provides a perspective from which to view contributions made by researchers in the field of speech and during the last two decades.
Stoppelman, Nadav; Harpaz, Tamar; Ben-Shachar, Michal
2013-05-01
Speech processing engages multiple cortical regions in the temporal, parietal, and frontal lobes. Isolating speech-sensitive cortex in individual participants is of major clinical and scientific importance. This task is complicated by the fact that responses to sensory and linguistic aspects of speech are tightly packed within the posterior superior temporal cortex. In functional magnetic resonance imaging (fMRI), various baseline conditions are typically used in order to isolate speech-specific from basic auditory responses. Using a short, continuous sampling paradigm, we show that reversed ("backward") speech, a commonly used auditory baseline for speech processing, removes much of the speech responses in frontal and temporal language regions of adult individuals. On the other hand, signal correlated noise (SCN) serves as an effective baseline for removing primary auditory responses while maintaining strong signals in the same language regions. We show that the response to reversed speech in left inferior frontal gyrus decays significantly faster than the response to speech, thus suggesting that this response reflects bottom-up activation of speech analysis followed up by top-down attenuation once the signal is classified as nonspeech. The results overall favor SCN as an auditory baseline for speech processing.
Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging.
Hagedorn, Christina; Proctor, Michael; Goldstein, Louis; Wilson, Stephen M; Miller, Bruce; Gorno-Tempini, Maria Luisa; Narayanan, Shrikanth S
2017-04-14
Real-time magnetic resonance imaging (MRI) and accompanying analytical methods are shown to capture and quantify salient aspects of apraxic speech, substantiating and expanding upon evidence provided by clinical observation and acoustic and kinematic data. Analysis of apraxic speech errors within a dynamic systems framework is provided and the nature of pathomechanisms of apraxic speech discussed. One adult male speaker with apraxia of speech was imaged using real-time MRI while producing spontaneous speech, repeated naming tasks, and self-paced repetition of word pairs designed to elicit speech errors. Articulatory data were analyzed, and speech errors were detected using time series reflecting articulatory activity in regions of interest. Real-time MRI captured two types of apraxic gestural intrusion errors in a word pair repetition task. Gestural intrusion errors in nonrepetitive speech, multiple silent initiation gestures at the onset of speech, and covert (unphonated) articulation of entire monosyllabic words were also captured. Real-time MRI and accompanying analytical methods capture and quantify many features of apraxic speech that have been previously observed using other modalities while offering high spatial resolution. This patient's apraxia of speech affected the ability to select only the appropriate vocal tract gestures for a target utterance, suppressing others, and to coordinate them in time.
Calvarial periosteal graft for second-stage cleft palate surgery: a preliminary report.
Neiva, Cecilia; Dakpe, Stephanie; Gbaguidi, Cica; Testelin, Sylvie; Devauchelle, Bernard
2014-07-01
The objectives of cleft palate surgery are to achieve optimal outcomes regarding speech development, hearing, maxillary arch development and facial skull growth. Early two-stage cleft palate repair has been the most recent protocol of choice to achieve good maxillary arch growth without compromising speech development. Hard palate closure occurs within one year of soft palate surgery. However, in some cases the residual hard palate cleft width is larger than 15 mm at the age of two. As previously reported, integrated speech development starts around that age and it is a challenge since we know that early mobilization of the mucoperiosteum interferes with normal facial growth on the long-term. In children with large residual hard palate clefts at the age 2, we report the use of calvarial periosteal grafts to close the cleft. With a retrospective 6-year study (2006-2012) we first analyzed the outcomes regarding impermeability of hard palate closure on 45 patients who at the age of two presented a residual cleft of the hard palate larger than 15 mm and benefited from a periosteal graft. We then studied the maxillary growth in these children. In order to compare long-term results, we included 14 patients (age range: 8-20) treated between 1994 & 2006. Two analyses were conducted, the first one on dental casts from birth to the age of 6 and the other one based on lateral cephalograms following Delaire's principles and TRIDIM software. After the systematic cephalometric analysis of 14 patients, we found no evidence of retrognathia or Class 3 dental malocclusion. In the population of 45 children who benefited from calvarial periosteal grafts the rate of palate fistula was 17% vs. 10% in the overall series. Despite major advances in understanding cleft defects, the issues of timing and choice of the surgical procedure remain widely debated. In second-stage surgery for hard palate closure, using a calvarial periosteal graft could be the solution for large residual clefts without compromising adequate speech development by encouraging proper maxillary arch growth. Copyright © 2013 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Recurrence Quantifcation Analysis of Sentence-Level Speech Kinematics
ERIC Educational Resources Information Center
Jackson, Eric S.; Tiede, Mark; Riley, Michael A.; Whalen, D. H.
2016-01-01
Purpose: Current approaches to assessing sentence-level speech variability rely on measures that quantify variability across utterances and use normalization procedures that alter raw trajectory data. The current work tests the feasibility of a less restrictive nonlinear approach--recurrence quantification analysis (RQA)--via a procedural example…
Discovering Communicative Competencies in a Nonspeaking Child with Autism
ERIC Educational Resources Information Center
Stiegler, Lillian N.
2007-01-01
Purpose: This article is intended to demonstrate that adapted conversation analysis (CA) and speech act analysis (SAA) may be applied by speech-language pathologists (SLPs) to (a) identify communicative competencies in nonspeaking children with autism spectrum disorder (ASD), especially during particularly successful interactions, and (b) identify…
Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers
Mustafa, Mumtaz Begum; Salim, Siti Salwah; Mohamed, Noraini; Al-Qatab, Bassam; Siong, Chng Eng
2014-01-01
Automatic speech recognition (ASR) is currently used in many assistive technologies, such as helping individuals with speech impairment in their communication ability. One challenge in ASR for speech-impaired individuals is the difficulty in obtaining a good speech database of impaired speakers for building an effective speech acoustic model. Because there are very few existing databases of impaired speech, which are also limited in size, the obvious solution to build a speech acoustic model of impaired speech is by employing adaptation techniques. However, issues that have not been addressed in existing studies in the area of adaptation for speech impairment are as follows: (1) identifying the most effective adaptation technique for impaired speech; and (2) the use of suitable source models to build an effective impaired-speech acoustic model. This research investigates the above-mentioned two issues on dysarthria, a type of speech impairment affecting millions of people. We applied both unimpaired and impaired speech as the source model with well-known adaptation techniques like the maximum likelihood linear regression (MLLR) and the constrained-MLLR(C-MLLR). The recognition accuracy of each impaired speech acoustic model is measured in terms of word error rate (WER), with further assessments, including phoneme insertion, substitution and deletion rates. Unimpaired speech when combined with limited high-quality speech-impaired data improves performance of ASR systems in recognising severely impaired dysarthric speech. The C-MLLR adaptation technique was also found to be better than MLLR in recognising mildly and moderately impaired speech based on the statistical analysis of the WER. It was found that phoneme substitution was the biggest contributing factor in WER in dysarthric speech for all levels of severity. The results show that the speech acoustic models derived from suitable adaptation techniques improve the performance of ASR systems in recognising impaired speech with limited adaptation data. PMID:24466004
1990-05-01
speech produced by these systems. Finally, perhaps the greatest recent impetus in advancing digital Finally, in the area of speech and speaker recognitio ...XX) Ilz and logarithmic beyond I(XX) Hz (91. ts(n) *n) n)mW0) SWS BNLP LOGO *) -KQfl1 BANoPASS FILTER LOWPASS FILTER 0 fLi fHl f 0 fLP f FIgure 2
A comparative intelligibility study of single-microphone noise reduction algorithms.
Hu, Yi; Loizou, Philipos C
2007-09-01
The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.
New Perspectives on Assessing Amplification Effects
Souza, Pamela E.; Tremblay, Kelly L.
2006-01-01
Clinicians have long been aware of the range of performance variability with hearing aids. Despite improvements in technology, there remain many instances of well-selected and appropriately fitted hearing aids whereby the user reports minimal improvement in speech understanding. This review presents a multistage framework for understanding how a hearing aid affects performance. Six stages are considered: (1) acoustic content of the signal, (2) modification of the signal by the hearing aid, (3) interaction between sound at the output of the hearing aid and the listener's ear, (4) integrity of the auditory system, (5) coding of available acoustic cues by the listener's auditory system, and (6) correct identification of the speech sound. Within this framework, this review describes methodology and research on 2 new assessment techniques: acoustic analysis of speech measured at the output of the hearing aid and auditory evoked potentials recorded while the listener wears hearing aids. Acoustic analysis topics include the relationship between conventional probe microphone tests and probe microphone measurements using speech, appropriate procedures for such tests, and assessment of signal-processing effects on speech acoustics and recognition. Auditory evoked potential topics include an overview of physiologic measures of speech processing and the effect of hearing loss and hearing aids on cortical auditory evoked potential measurements in response to speech. Finally, the clinical utility of these procedures is discussed. PMID:16959734
Independent component analysis algorithm FPGA design to perform real-time blind source separation
NASA Astrophysics Data System (ADS)
Meyer-Baese, Uwe; Odom, Crispin; Botella, Guillermo; Meyer-Baese, Anke
2015-05-01
The conditions that arise in the Cocktail Party Problem prevail across many fields creating a need for of Blind Source Separation. The need for BSS has become prevalent in several fields of work. These fields include array processing, communications, medical signal processing, and speech processing, wireless communication, audio, acoustics and biomedical engineering. The concept of the cocktail party problem and BSS led to the development of Independent Component Analysis (ICA) algorithms. ICA proves useful for applications needing real time signal processing. The goal of this research was to perform an extensive study on ability and efficiency of Independent Component Analysis algorithms to perform blind source separation on mixed signals in software and implementation in hardware with a Field Programmable Gate Array (FPGA). The Algebraic ICA (A-ICA), Fast ICA, and Equivariant Adaptive Separation via Independence (EASI) ICA were examined and compared. The best algorithm required the least complexity and fewest resources while effectively separating mixed sources. The best algorithm was the EASI algorithm. The EASI ICA was implemented on hardware with Field Programmable Gate Arrays (FPGA) to perform and analyze its performance in real time.
Bloch, Steven
2011-01-01
The study described here investigates the practice of anticipatory completion of augmentative and alternative communication (AAC) utterances in progress. The aims were to identify and analyse features of this practice as they occur in natural conversation between a person using an AAC system and a family member. The methods and principles of Conversation Analysis (CA) were used to video record conversations between people with progressive neurological diseases and a progressive speech disorder (dysarthria) and their family members. Key features of interaction were identified and extracts transcribed. Four extracts of talk between a man with motor neurone disease/amyotrophic lateral sclerosis and his mother are presented here. Anticipatory completion of AAC utterances is intimately related to the sequential context in which such utterances occur. Difficulties can arise from topic shifts, understanding the intended action of an AAC word in progress and in recognising the possible end point an utterance. The analysis highlights the importance of understanding how AAC talk works in everyday interaction. The role of co-participants is particularly important here. These results may have implications for both AAC software design and clinical intervention.
Alomar, Soha; King, Nicolas K K; Tam, Joseph; Bari, Ausaf A; Hamani, Clement; Lozano, Andres M
2017-01-01
The thalamus has been a surgical target for the treatment of various movement disorders. Commonly used therapeutic modalities include ablative and nonablative procedures. A major clinical side effect of thalamic surgery is the appearance of speech problems. This review summarizes the data on the development of speech problems after thalamic surgery. A systematic review and meta-analysis was performed using nine databases, including Medline, Web of Science, and Cochrane Library. We also checked for articles by searching citing and cited articles. We retrieved studies between 1960 and September 2014. Of a total of 2,320 patients, 19.8% (confidence interval: 14.8-25.9) had speech difficulty after thalamotomy. Speech difficulty occurred in 15% (confidence interval: 9.8-22.2) of those treated with a unilaterally and 40.6% (confidence interval: 29.5-52.8) of those treated bilaterally. Speech impairment was noticed 2- to 3-fold more commonly after left-sided procedures (40.7% vs. 15.2%). Of the 572 patients that underwent DBS, 19.4% (confidence interval: 13.1-27.8) experienced speech difficulty. Subgroup analysis revealed that this complication occurs in 10.2% (confidence interval: 7.4-13.9) of patients treated unilaterally and 34.6% (confidence interval: 21.6-50.4) treated bilaterally. After thalamotomy, the risk was higher in Parkinson's patients compared to patients with essential tremor: 19.8% versus 4.5% in the unilateral group and 42.5% versus 13.9% in the bilateral group. After DBS, this rate was higher in essential tremor patients. Both lesioning and stimulation thalamic surgery produce adverse effects on speech. Left-sided and bilateral procedures are approximately 3-fold more likely to cause speech difficulty. This effect was higher after thalamotomy compared to DBS. In the thalamotomy group, the risk was higher in Parkinson's patients, whereas in the DBS group it was higher in patients with essential tremor. Understanding the pathophysiology of speech disturbance after thalamic procedures is a priority. © 2017 International Parkinson and Movement Disorder Society. © 2016 International Parkinson and Movement Disorder Society.
Techniques for the Enhancement of Linear Predictive Speech Coding in Adverse Conditions
NASA Astrophysics Data System (ADS)
Wrench, Alan A.
Available from UMI in association with The British Library. Requires signed TDF. The Linear Prediction model was first applied to speech two and a half decades ago. Since then it has been the subject of intense research and continues to be one of the principal tools in the analysis of speech. Its mathematical tractability makes it a suitable subject for study and its proven success in practical applications makes the study worthwhile. The model is known to be unsuited to speech corrupted by background noise. This has led many researchers to investigate ways of enhancing the speech signal prior to Linear Predictive analysis. In this thesis this body of work is extended. The chosen application is low bit-rate (2.4 kbits/sec) speech coding. For this task the performance of the Linear Prediction algorithm is crucial because there is insufficient bandwidth to encode the error between the modelled speech and the original input. A review of the fundamentals of Linear Prediction and an independent assessment of the relative performance of methods of Linear Prediction modelling are presented. A new method is proposed which is fast and facilitates stability checking, however, its stability is shown to be unacceptably poorer than existing methods. A novel supposition governing the positioning of the analysis frame relative to a voiced speech signal is proposed and supported by observation. The problem of coding noisy speech is examined. Four frequency domain speech processing techniques are developed and tested. These are: (i) Combined Order Linear Prediction Spectral Estimation; (ii) Frequency Scaling According to an Aural Model; (iii) Amplitude Weighting Based on Perceived Loudness; (iv) Power Spectrum Squaring. These methods are compared with the Recursive Linearised Maximum a Posteriori method. Following on from work done in the frequency domain, a time domain implementation of spectrum squaring is developed. In addition, a new method of power spectrum estimation is developed based on the Minimum Variance approach. This new algorithm is shown to be closely related to Linear Prediction but produces slightly broader spectral peaks. Spectrum squaring is applied to both the new algorithm and standard Linear Prediction and their relative performance is assessed. (Abstract shortened by UMI.).
Dias, Roberta Freitas; Melo, Roberta Michelon; Mezzomo, Carolina Lisbôa; Mota, Helena Bolli
2013-01-01
To analyze the possible relationship among the awareness of one's own speech disorder and some aspects of the phonological system, as the number and the type of changed distinctive features, as well as the interaction among the severity of the disorder and the non-specification of distinctive features. The analyzed group has 23 children with diagnosis of speech disorder, aged 5:0 to 7:7. The speech data were analyzed through the Distinctive Features Analysis and classified by the Percentage of Correct Consonants. One also applied the Awareness of one's own speech disorder test. The children were separated in two groups: with awareness of their own speech disorder established (more than 50% of correct identification) and without awareness of their own speech disorder established (less than 50% of correct identification). Finally, the variables of this research were submitted to analysis using descriptive and inferential statistics. The type of changed distinctive features weren't different between the groups, as well as the total of changed features and the severity disorder. However, a correlation between the severity disorder and the non-specification of distinctive features was verified, because the more severe disorders have more changes in these linguistic variables. The awareness of one's own speech disorder doesn't seem to be directly influenced by the type and by the number of changed distinctive features, neither by the speech disorder severity. Moreover, one verifies that the greater phonological disorder severity, the greater the number of changed distinctive features.
COMPUTATIONAL ANALYSIS OF SWALLOWING MECHANICS UNDERLYING IMPAIRED EPIGLOTTIC INVERSION
Pearson, William G.; Taylor, Brandon K; Blair, Julie; Martin-Harris, Bonnie
2015-01-01
Objective Determine swallowing mechanics associated with the first and second epiglottic movements, that is, movement to horizontal and full inversion respectively, in order to provide a clinical interpretation of impaired epiglottic function. Study Design Retrospective cohort study. Methods A heterogeneous cohort of patients with swallowing difficulties was identified (n=92). Two speech-language pathologists reviewed 5ml thin and 5ml pudding videofluoroscopic swallow studies per subject, and assigned epiglottic component scores of 0=complete inversion, 1=partial inversion, and 2=no inversion forming three groups of videos for comparison. Coordinates mapping minimum and maximum excursion of the hyoid, pharynx, larynx, and tongue base during pharyngeal swallowing were recorded using ImageJ software. A canonical variate analysis with post-hoc discriminant function analysis of coordinates was performed using MorphoJ software to evaluate mechanical differences between groups. Eigenvectors characterizing swallowing mechanics underlying impaired epiglottic movements were visualized. Results Nineteen of 184 video-swallows were rejected for poor quality (n=165). A Goodman-Kruskal index of predictive association showed no correlation between epiglottic component scores and etiologies of dysphagia (λ=.04). A two-way analysis of variance by epiglottic component scores showed no significant interaction effects between sex and age (f=1.4, p=.25). Discriminant function analysis demonstrated statistically significant mechanical differences between epiglottic component scores: 1&2, representing the first epiglottic movement (Mahalanobis distance=1.13, p=.0007); and, 0&1, representing the second epiglottic movement (Mahalanobis distance=0.83, p=.003). Eigenvectors indicate that laryngeal elevation and tongue base retraction underlie both epiglottic movements. Conclusion Results suggest that reduced tongue base retraction and laryngeal elevation underlie impaired first and second epiglottic movements. The styloglossus, hyoglossus and long pharyngeal muscles are implicated as targets for rehabilitation in dysphagic patients with impaired epiglottic inversion. PMID:27426940
Thiel, Lindsey; Sage, Karen; Conroy, Paul
2017-01-01
Improving email writing in people with aphasia could enhance their ability to communicate, promote interaction and reduce isolation. Spelling therapies have been effective in improving single-word writing. However, there has been limited evidence on how to achieve changes to everyday writing tasks such as email writing in people with aphasia. One potential area that has been largely unexplored in the literature is the potential use of assistive writing technologies, despite some initial evidence that assistive writing software use can lead to qualitative and quantitative improvements to spontaneous writing. This within-participants case series design study aimed to investigate the effects of using assistive writing software to improve email writing in participants with dysgraphia related to aphasia. Eight participants worked through a hierarchy of writing tasks of increasing complexity within broad topic areas that incorporate the spheres of writing need of the participants: writing for domestic needs, writing for social needs and writing for business/administrative needs. Through completing these tasks, participants had the opportunity to use the various functions of the software, such as predictive writing, word banks and text to speech. Therapy also included training and practice in basic computer and email skills to encourage increased independence. Outcome measures included email skills, keyboard skills, email writing and written picture description tasks, and a perception of disability assessment. Four of the eight participants showed statistically significant improvements to spelling accuracy within emails when using the software. At a group level there was a significant increase in word length with the software; while four participants showed noteworthy changes to the range of word classes used. Enhanced independence in email use and improvements in participants' perceptions of their writing skills were also noted. This study provided some initial evidence that assistive writing technologies can support people with aphasia in email writing across a range of important performance parameters. However, more research is needed to measure the effects of these technologies on the writing of people with aphasia, and to determine the optimal compensatory mechanisms for specific people given the linguistic-strategic resources they bring to the task of email writing. © 2016 Royal College of Speech and Language Therapists.
Effect of Drama Instruction Method on Students' Turkish Verbal Skills and Speech Anxiety
ERIC Educational Resources Information Center
Kardas, Mehmet Nuri; Koç, Rasit
2017-01-01
The objective of the present study is to determine the effect of the "drama" method on students' Turkish verbal skills and speech anxiety. Pretest-posttest experimental model with control group was utilized in the study. In the analysis of data obtained by Turkish Rhetorical Skills Scale (TRSS) and Speech Anxiety Scale (SAS), t-test…
Error Consistency in Acquired Apraxia of Speech with Aphasia: Effects of the Analysis Unit
ERIC Educational Resources Information Center
Haley, Katarina L.; Cunningham, Kevin T.; Eaton, Catherine Torrington; Jacks, Adam
2018-01-01
Purpose: Diagnostic recommendations for acquired apraxia of speech (AOS) have been contradictory concerning whether speech sound errors are consistent or variable. Studies have reported divergent findings that, on face value, could argue either for or against error consistency as a diagnostic criterion. The purpose of this study was to explain…
ERIC Educational Resources Information Center
LeBlanc, Judith M.
To gain some insight into the problem of deviant speech development in low income populations, this study investigated the environmental factors that encourage the development of normal speech. Two specific questions were examined in this study: (1) If specific vocalized environmental sounds are presented contiguously with reinforcement, will…
A Pragma-Stylistic Analysis of President Goodluck Ebele Jonathan Inaugural Speech
ERIC Educational Resources Information Center
Abuya, Eromosele John
2012-01-01
The study was an examination through the pragma-stylistic approach to meaning of the linguistic acts that manifest in the Inaugural Speech of Goodluck Ebele Jonathan as the democratically elected president in May 2011 General Elections in Nigeria. Hence, the study focused on speech acts type of locution, illocutionary and perlocutionary in the…
Zeng, Yin-Ting; Hwu, Wuh-Liang; Torng, Pao-Chuan; Lee, Ni-Chung; Shieh, Jeng-Yi; Lu, Lu; Chien, Yin-Hsiu
2017-05-01
Patients with infantile-onset Pompe disease (IOPD) can be treated by recombinant human acid alpha glucosidase (rhGAA) replacement beginning at birth with excellent survival rates, but they still commonly present with speech disorders. This study investigated the progress of speech disorders in these early-treated patients and ascertained the relationship with treatments. Speech disorders, including hypernasal resonance, articulation disorders, and speech intelligibility, were scored by speech-language pathologists using auditory perception in seven early-treated patients over a period of 6 years. Statistical analysis of the first and last evaluations of the patients was performed with the Wilcoxon signed-rank test. A total of 29 speech samples were analyzed. All the patients suffered from hypernasality, articulation disorder, and impairment in speech intelligibility at the age of 3 years. The conditions were stable, and 2 patients developed normal or near normal speech during follow-up. Speech therapy and a high dose of rhGAA appeared to improve articulation in 6 of the 7 patients (86%, p = 0.028) by decreasing the omission of consonants, which consequently increased speech intelligibility (p = 0.041). Severity of hypernasality greatly reduced only in 2 patients (29%, p = 0.131). Speech disorders were common even in early and successfully treated patients with IOPD; however, aggressive speech therapy and high-dose rhGAA could improve their speech disorders. Copyright © 2016 European Paediatric Neurology Society. Published by Elsevier Ltd. All rights reserved.
Calandruccio, Lauren; Bradlow, Ann R.; Dhar, Sumitrajit
2013-01-01
Background Masking release for an English sentence-recognition task in the presence of foreign-accented English speech compared to native-accented English speech was reported in Calandruccio, Dhar and Bradlow (2010). The masking release appeared to increase as the masker intelligibility decreased. However, it could not be ruled out that spectral differences between the speech maskers were influencing the significant differences observed. Purpose The purpose of the current experiment was to minimize spectral differences between speech maskers to determine how various amounts of linguistic information within competing speech affect masking release. Research Design A mixed model design with within- (four two-talker speech maskers) and between-subject (listener group) factors was conducted. Speech maskers included native-accented English speech, and high-intelligibility, moderate-intelligibility and low-intelligibility Mandarin-accented English. Normalizing the long-term average speech spectra of the maskers to each other minimized spectral differences between the masker conditions. Study Sample Three listener groups were tested including monolingual English speakers with normal hearing, non-native speakers of English with normal hearing, and monolingual speakers of English with hearing loss. The non-native speakers of English were from various native-language backgrounds, not including Mandarin (or any other Chinese dialect). Listeners with hearing loss had symmetrical, mild sloping to moderate sensorineural hearing loss. Data Collection and Analysis Listeners were asked to repeat back sentences that were presented in the presence of four different two-talker speech maskers. Responses were scored based on the keywords within the sentences (100 keywords/masker condition). A mixed-model regression analysis was used to analyze the difference in performance scores between the masker conditions and the listener groups. Results Monolingual speakers of English with normal hearing benefited when the competing speech signal was foreign-accented compared to native-accented allowing for improved speech recognition. Various levels of intelligibility across the foreign-accented speech maskers did not influence results. Neither the non-native English listeners with normal hearing, nor the monolingual English speakers with hearing loss benefited from masking release when the masker was changed from native-accented to foreign-accented English. Conclusions Slight modifications between the target and the masker speech allowed monolingual speakers of English with normal hearing to improve their recognition of native-accented English even when the competing speech was highly intelligible. Further research is needed to determine which modifications within the competing speech signal caused the Mandarin-accented English to be less effective with respect to masking. Determining the influences within the competing speech that make it less effective as a masker, or determining why monolingual normal-hearing listeners can take advantage of these differences could help improve speech recognition for those with hearing loss in the future. PMID:25126683
Optimizing Vowel Formant Measurements in Four Acoustic Analysis Systems for Diverse Speaker Groups
Derdemezis, Ekaterini; Kent, Ray D.; Fourakis, Marios; Reinicke, Emily L.; Bolt, Daniel M.
2016-01-01
Purpose This study systematically assessed the effects of select linear predictive coding (LPC) analysis parameter manipulations on vowel formant measurements for diverse speaker groups using 4 trademarked Speech Acoustic Analysis Software Packages (SAASPs): CSL, Praat, TF32, and WaveSurfer. Method Productions of 4 words containing the corner vowels were recorded from 4 speaker groups with typical development (male and female adults and male and female children) and 4 speaker groups with Down syndrome (male and female adults and male and female children). Formant frequencies were determined from manual measurements using a consensus analysis procedure to establish formant reference values, and from the 4 SAASPs (using both the default analysis parameters and with adjustments or manipulations to select parameters). Smaller differences between values obtained from the SAASPs and the consensus analysis implied more optimal analysis parameter settings. Results Manipulations of default analysis parameters in CSL, Praat, and TF32 yielded more accurate formant measurements, though the benefit was not uniform across speaker groups and formants. In WaveSurfer, manipulations did not improve formant measurements. Conclusions The effects of analysis parameter manipulations on accuracy of formant-frequency measurements varied by SAASP, speaker group, and formant. The information from this study helps to guide clinical and research applications of SAASPs. PMID:26501214
Gilbert, Kathryn E
2013-02-01
Recent attempts to regulate Crisis Pregnancy Centers, pseudoclinics that surreptitiously aim to dissuade pregnant women from choosing abortion, have confronted the thorny problem of how to define commercial speech. The Supreme Court has offered three potential answers to this definitional quandary. This Note uses the Crisis Pregnancy Center cases to demonstrate that courts should use one of these solutions, the factor-based approach of Bolger v. Youngs Drugs Products Corp., to define commercial speech in the Crisis Pregnancy Center cases and elsewhere. In principle and in application, the Bolger factor-based approach succeeds in structuring commercial speech analysis at the margins of the doctrine.
Acquisition of speech rhythm in first language.
Polyanskaya, Leona; Ordin, Mikhail
2015-09-01
Analysis of English rhythm in speech produced by children and adults revealed that speech rhythm becomes increasingly more stress-timed as language acquisition progresses. Children reach the adult-like target by 11 to 12 years. The employed speech elicitation paradigm ensured that the sentences produced by adults and children at different ages were comparable in terms of lexical content, segmental composition, and phonotactic complexity. Detected differences between child and adult rhythm and between rhythm in child speech at various ages cannot be attributed to acquisition of phonotactic language features or vocabulary, and indicate the development of language-specific phonetic timing in the course of acquisition.
On the Perception of Speech Sounds as Biologically Significant Signals1,2
Pisoni, David B.
2012-01-01
This paper reviews some of the major evidence and arguments currently available to support the view that human speech perception may require the use of specialized neural mechanisms for perceptual analysis. Experiments using synthetically produced speech signals with adults are briefly summarized and extensions of these results to infants and other organisms are reviewed with an emphasis towards detailing those aspects of speech perception that may require some need for specialized species-specific processors. Finally, some comments on the role of early experience in perceptual development are provided as an attempt to identify promising areas of new research in speech perception. PMID:399200
Speech transformation system (spectrum and/or excitation) without pitch extraction
NASA Astrophysics Data System (ADS)
Seneff, S.
1980-07-01
A speech analysis synthesis system was developed which is capable of independent manipulation of the fundamental frequency and spectral envelope of a speech waveform. The system deconvolved the original speech with the spectral envelope estimate to obtain a model for the excitation, explicit pitch extraction was not required and as a consequence, the transformed speech was more natural sounding than would be the case if the excitation were modeled as a sequence of pulses. It is shown that the system has applications in the areas of voice modifications, baseband excited vocoders, time scale modifications, and frequency compression as an aid to the partially deaf.
Temporal modulations in speech and music.
Ding, Nai; Patel, Aniruddh D; Chen, Lin; Butler, Henry; Luo, Cheng; Poeppel, David
2017-10-01
Speech and music have structured rhythms. Here we discuss a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32Hz) temporal modulations in sound intensity and compare the modulation properties of speech and music. We analyze these modulations using over 25h of speech and over 39h of recordings of Western music. We show that the speech modulation spectrum is highly consistent across 9 languages (including languages with typologically different rhythmic characteristics). A different, but similarly consistent modulation spectrum is observed for music, including classical music played by single instruments of different types, symphonic, jazz, and rock. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2Hz, respectively. These acoustically dominant time scales may be intrinsic features of speech and music, a possibility which should be investigated using more culturally diverse samples in each domain. Distinct modulation timescales for speech and music could facilitate their perceptual analysis and its neural processing. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hearing impaired speech in noisy classrooms
NASA Astrophysics Data System (ADS)
Shahin, Kimary; McKellin, William H.; Jamieson, Janet; Hodgson, Murray; Pichora-Fuller, M. Kathleen
2005-04-01
Noisy classrooms have been shown to induce among students patterns of interaction similar to those used by hearing impaired people [W. H. McKellin et al., GURT (2003)]. In this research, the speech of children in a noisy classroom setting was investigated to determine if noisy classrooms have an effect on students' speech. Audio recordings were made of the speech of students during group work in their regular classrooms (grades 1-7), and of the speech of the same students in a sound booth. Noise level readings in the classrooms were also recorded. Each student's noisy and quiet environment speech samples were acoustically analyzed for prosodic and segmental properties (f0, pitch range, pitch variation, phoneme duration, vowel formants), and compared. The analysis showed that the students' speech in the noisy classrooms had characteristics of the speech of hearing-impaired persons [e.g., R. O'Halpin, Clin. Ling. and Phon. 15, 529-550 (2001)]. Some educational implications of our findings were identified. [Work supported by the Peter Wall Institute for Advanced Studies, University of British Columbia.
Emotion to emotion speech conversion in phoneme level
NASA Astrophysics Data System (ADS)
Bulut, Murtaza; Yildirim, Serdar; Busso, Carlos; Lee, Chul Min; Kazemzadeh, Ebrahim; Lee, Sungbok; Narayanan, Shrikanth
2004-10-01
Having an ability to synthesize emotional speech can make human-machine interaction more natural in spoken dialogue management. This study investigates the effectiveness of prosodic and spectral modification in phoneme level on emotion-to-emotion speech conversion. The prosody modification is performed with the TD-PSOLA algorithm (Moulines and Charpentier, 1990). We also transform the spectral envelopes of source phonemes to match those of target phonemes using LPC-based spectral transformation approach (Kain, 2001). Prosodic speech parameters (F0, duration, and energy) for target phonemes are estimated from the statistics obtained from the analysis of an emotional speech database of happy, angry, sad, and neutral utterances collected from actors. Listening experiments conducted with native American English speakers indicate that the modification of prosody only or spectrum only is not sufficient to elicit targeted emotions. The simultaneous modification of both prosody and spectrum results in higher acceptance rates of target emotions, suggesting that not only modeling speech prosody but also modeling spectral patterns that reflect underlying speech articulations are equally important to synthesize emotional speech with good quality. We are investigating suprasegmental level modifications for further improvement in speech quality and expressiveness.
Tilsen, Sam; Arvaniti, Amalia
2013-07-01
This study presents a method for analyzing speech rhythm using empirical mode decomposition of the speech amplitude envelope, which allows for extraction and quantification of syllabic- and supra-syllabic time-scale components of the envelope. The method of empirical mode decomposition of a vocalic energy amplitude envelope is illustrated in detail, and several types of rhythm metrics derived from this method are presented. Spontaneous speech extracted from the Buckeye Corpus is used to assess the effect of utterance length on metrics, and it is shown how metrics representing variability in the supra-syllabic time-scale components of the envelope can be used to identify stretches of speech with targeted rhythmic characteristics. Furthermore, the envelope-based metrics are used to characterize cross-linguistic differences in speech rhythm in the UC San Diego Speech Lab corpus of English, German, Greek, Italian, Korean, and Spanish speech elicited in read sentences, read passages, and spontaneous speech. The envelope-based metrics exhibit significant effects of language and elicitation method that argue for a nuanced view of cross-linguistic rhythm patterns.
Attentional Gain Control of Ongoing Cortical Speech Representations in a “Cocktail Party”
Kerlin, Jess R.; Shahin, Antoine J.; Miller, Lee M.
2010-01-01
Normal listeners possess the remarkable perceptual ability to select a single speech stream among many competing talkers. However, few studies of selective attention have addressed the unique nature of speech as a temporally extended and complex auditory object. We hypothesized that sustained selective attention to speech in a multi-talker environment would act as gain control on the early auditory cortical representations of speech. Using high-density electroencephalography and a template-matching analysis method, we found selective gain to the continuous speech content of an attended talker, greatest at a frequency of 4–8 Hz, in auditory cortex. In addition, the difference in alpha power (8–12 Hz) at parietal sites across hemispheres indicated the direction of auditory attention to speech, as has been previously found in visual tasks. The strength of this hemispheric alpha lateralization, in turn, predicted an individual’s attentional gain of the cortical speech signal. These results support a model of spatial speech stream segregation, mediated by a supramodal attention mechanism, enabling selection of the attended representation in auditory cortex. PMID:20071526
Cochlear blood flow and speech perception ability in cochlear implant users.
Nakashima, Tsutomu; Hattori, Taku; Sone, Michihiko; Asahi, Kiyomitsu; Matsuda, Naoko; Teranishi, Masaaki; Yoshida, Tadao; Kato, Ken; Sato, Eisuke
2012-02-01
The effect of cochlear blood flow (CBF) on speech perception ability in cochlear implant (CI) users has not been reported. We investigated various factors influencing speech perception including CBF in CI users. Eighty-two patients who received CI surgery at an academic hospital. CBF was measured during CI surgery using laser Doppler flowmetry. The speech perception level was measured after a sufficient interval after CI surgery. Multivariate analysis was used to evaluate the influences of age, duration of deafness, sex, cause of deafness, and CBF on the speech perception level. CBF decreased significantly with age but was not related to the speech perception level. In patients with congenital hearing loss, the speech perception level was significantly worse in children who received a CI at 3 years of age than in those who received a CI at 2 years of age or younger. Duration of deafness before CI surgery had deteriorative effects on the speech perception level. CBF may be associated with progression of hearing loss. However, measuring CBF during CI surgery is not useful for predicting postoperative speech perception.
The predictive roles of neural oscillations in speech motor adaptability.
Sengupta, Ranit; Nasir, Sazzad M
2016-06-01
The human speech system exhibits a remarkable flexibility by adapting to alterations in speaking environments. While it is believed that speech motor adaptation under altered sensory feedback involves rapid reorganization of speech motor networks, the mechanisms by which different brain regions communicate and coordinate their activity to mediate adaptation remain unknown, and explanations of outcome differences in adaption remain largely elusive. In this study, under the paradigm of altered auditory feedback with continuous EEG recordings, the differential roles of oscillatory neural processes in motor speech adaptability were investigated. The predictive capacities of different EEG frequency bands were assessed, and it was found that theta-, beta-, and gamma-band activities during speech planning and production contained significant and reliable information about motor speech adaptability. It was further observed that these bands do not work independently but interact with each other suggesting an underlying brain network operating across hierarchically organized frequency bands to support motor speech adaptation. These results provide novel insights into both learning and disorders of speech using time frequency analysis of neural oscillations. Copyright © 2016 the American Physiological Society.
Processing changes when listening to foreign-accented speech
Romero-Rivas, Carlos; Martin, Clara D.; Costa, Albert
2015-01-01
This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs) were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extraction of spectral information and other important acoustic features was hampered during foreign-accented speech comprehension. However, the amplitude of the N400 component for foreign-accented speech comprehension decreased across the experiment, suggesting the use of a higher level, lexical mechanism. Furthermore, during native speech comprehension, semantic violations in the critical words elicited an N400 effect followed by a late positivity. During foreign-accented speech comprehension, semantic violations only elicited an N400 effect. Overall, our results suggest that, despite a lack of improvement in phonetic discrimination, native listeners experience changes at lexical-semantic levels of processing after brief exposure to foreign-accented speech. Moreover, these results suggest that lexical access, semantic integration and linguistic re-analysis processes are permeable to external factors, such as the accent of the speaker. PMID:25859209
Discriminative analysis of lip motion features for speaker identification and speech-reading.
Cetingül, H Ertan; Yemez, Yücel; Erzin, Engin; Tekalp, A Murat
2006-10-01
There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.
Comprehension of Co-Speech Gestures in Aphasic Patients: An Eye Movement Study
Eggenberger, Noëmi; Preisig, Basil C.; Schumacher, Rahel; Hopfner, Simone; Vanbellingen, Tim; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Cazzoli, Dario; Müri, René M.
2016-01-01
Background Co-speech gestures are omnipresent and a crucial element of human interaction by facilitating language comprehension. However, it is unclear whether gestures also support language comprehension in aphasic patients. Using visual exploration behavior analysis, the present study aimed to investigate the influence of congruence between speech and co-speech gestures on comprehension in terms of accuracy in a decision task. Method Twenty aphasic patients and 30 healthy controls watched videos in which speech was either combined with meaningless (baseline condition), congruent, or incongruent gestures. Comprehension was assessed with a decision task, while remote eye-tracking allowed analysis of visual exploration. Results In aphasic patients, the incongruent condition resulted in a significant decrease of accuracy, while the congruent condition led to a significant increase in accuracy compared to baseline accuracy. In the control group, the incongruent condition resulted in a decrease in accuracy, while the congruent condition did not significantly increase the accuracy. Visual exploration analysis showed that patients fixated significantly less on the face and tended to fixate more on the gesturing hands compared to controls. Conclusion Co-speech gestures play an important role for aphasic patients as they modulate comprehension. Incongruent gestures evoke significant interference and deteriorate patients’ comprehension. In contrast, congruent gestures enhance comprehension in aphasic patients, which might be valuable for clinical and therapeutic purposes. PMID:26735917
ERIC Educational Resources Information Center
Lancia, Leonardo; Fuchs, Susanne; Tiede, Mark
2014-01-01
Purpose: The aim of this article was to introduce an important tool, cross-recurrence analysis, to speech production applications by showing how it can be adapted to evaluate the similarity of multivariate patterns of articulatory motion. The method differs from classical applications of cross-recurrence analysis because no phase space…
NASA Astrophysics Data System (ADS)
Sommer, David; Erath, Byron D.; Zanartu, Matias; Peterson, Sean D.
2011-11-01
Voiced speech is produced by dynamic fluid-structure interactions in the larynx. Traditionally, reduced order models of speech have relied upon simplified inviscid flow solvers to prescribe the fluid loadings that drive vocal fold motion, neglecting viscous flow effects that occur naturally in voiced speech. Viscous phenomena, such as skewing of the intraglottal jet, have the most pronounced effect on voiced speech in cases of vocal fold paralysis where one vocal fold loses some, or all, muscular control. The impact of asymmetric intraglottal flow in pathological speech is captured in a reduced order two-mass model of speech by coupling a boundary-layer estimation of the asymmetric pressures with asymmetric tissue parameters that are representative of recurrent laryngeal nerve paralysis. Nonlinear analysis identifies the emergence of irregular and chaotic vocal fold dynamics at values representative of pathological speech conditions.
Brain activity related to phonation in young patients with adductor spasmodic dysphonia.
Kiyuna, Asanori; Maeda, Hiroyuki; Higa, Asano; Shingaki, Kouta; Uehara, Takayuki; Suzuki, Mikio
2014-06-01
This study investigated the brain activities during phonation of young patients with adductor spasmodic dysphonia (ADSD) of relatively short disease duration (<10 years). Six subjects with ADSD of short duration (mean age: 24. 3 years; mean disease duration: 41 months) and six healthy controls (mean age: 30.8 years) underwent functional magnetic resonance imaging (fMRI) using a sparse sampling method to identify brain activity during vowel phonation (/i:/). Intragroup and intergroup analyses were performed using statistical parametric mapping software. Areas of activation in the ADSD and control groups were similar to those reported previously for vowel phonation. All of the activated areas were observed bilaterally and symmetrically. Intergroup analysis revealed higher brain activities in the SD group in the auditory-related areas (Brodmann's areas [BA] 40, 41), motor speech areas (BA44, 45), bilateral insula (BA13), bilateral cerebellum, and middle frontal gyrus (BA46). Areas with lower activation were in the left primary sensory area (BA1-3) and bilateral subcortical nucleus (putamen and globus pallidus). The auditory cortical responses observed may reflect that young ADSD patients control their voice by use of the motor speech area, insula, inferior parietal cortex, and cerebellum. Neural activity in the primary sensory area and basal ganglia may affect the voice symptoms of young ADSD patients with short disease duration. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Dobre, Robert A.; Negrescu, Cristian; Stanomir, Dumitru
2016-12-01
In many situations audio recordings can decide the fate of a trial when accepted as evidence. But until they can be taken into account they must be authenticated at first, but also the quality of the targeted content (speech in most cases) must be good enough to remove any doubt. In this scope two main directions of multimedia forensics come into play: content authentication and noise reduction. This paper presents an application that is included in the latter. If someone would like to conceal their conversation, the easiest way to do it would be to turn loud the nearest audio system. In this situation, if a microphone was placed close by, the recorded signal would be apparently useless because the speech signal would be masked by the loud music signal. The paper proposes an adaptive filters based solution to remove the musical content from a previously described signal mixture in order to recover the masked vocal signal. Two adaptive filtering algorithms were tested in the proposed solution: the Normalised Least Mean Squares (NLMS) and Recursive Least Squares (RLS). Their performances in the described situation were evaluated using Simulink, compared and included in the paper.
The development of the Nucleus Freedom Cochlear implant system.
Patrick, James F; Busby, Peter A; Gibson, Peter J
2006-12-01
Cochlear Limited (Cochlear) released the fourth-generation cochlear implant system, Nucleus Freedom, in 2005. Freedom is based on 25 years of experience in cochlear implant research and development and incorporates advances in medicine, implantable materials, electronic technology, and sound coding. This article presents the development of Cochlear's implant systems, with an overview of the first 3 generations, and details of the Freedom system: the CI24RE receiver-stimulator, the Contour Advance electrode, the modular Freedom processor, the available speech coding strategies, the input processing options of Smart Sound to improve the signal before coding as electrical signals, and the programming software. Preliminary results from multicenter studies with the Freedom system are reported, demonstrating better levels of performance compared with the previous systems. The final section presents the most recent implant reliability data, with the early findings at 18 months showing improved reliability of the Freedom implant compared with the earlier Nucleus 3 System. Also reported are some of the findings of Cochlear's collaborative research programs to improve recipient outcomes. Included are studies showing the benefits from bilateral implants, electroacoustic stimulation using an ipsilateral and/or contralateral hearing aid, advanced speech coding, and streamlined speech processor programming.
2000-01-01
for flight test data, and both generic and specialized tools of data filtering , data calibration, modeling , system identification, and simulation...GRAMMATICAL MODEL AND PARSER FOR AIR TRAFFIC CONTROLLER’S COMMANDS 11 A SPEECH-CONTROLLED INTERACTIVE VIRTUAL ENVIRONMENT FOR SHIP FAMILIARIZATION 12... MODELING AND SIMULATION IN THE 21ST CENTURY 23 NEW COTS HARDWARE AND SOFTWARE REDUCE THE COST AND EFFORT IN REPLACING AGING FLIGHT SIMULATORS SUBSYSTEMS
NASA Technical Reports Server (NTRS)
2004-01-01
I/NET, Inc., is making the dream of natural human-computer conversation a practical reality. Through a combination of advanced artificial intelligence research and practical software design, I/NET has taken the complexity out of developing advanced, natural language interfaces. Conversational capabilities like pronoun resolution, anaphora and ellipsis processing, and dialog management that were once available only in the laboratory can now be brought to any application with any speech recognition system using I/NET s conversational engine middleware.
Kesav, Praveen; Vrinda, S L; Sukumaran, Sajith; Sarma, P S; Sylaja, P N
2017-09-15
This study aimed to assess the feasibility of professional based conventional speech language therapy (SLT) either alone (Group A/less intensive) or assisted by novel computer based local language software (Group B/more intensive) for rehabilitation in early post stroke aphasia. Comprehensive Stroke Care Center of a tertiary health care institute situated in South India, with the study design being prospective open randomised controlled trial with blinded endpoint evaluation. This study recruited 24 right handed first ever acute ischemic stroke patients above 15years of age affecting middle cerebral artery territory within 90days of stroke onset with baseline Western Aphasia Battery (WAB) Aphasia Quotient (AQ) score of <93.8 between September 2013 and January 2016.The recruited subjects were block randomised into either Group A/less intensive or Group B/more intensive therapy arms, in order to receive 12 therapy sessions of conventional professional based SLT of 1h each in both groups, with an additional 12h of computer based language therapy in Group B over 4weeks on a thrice weekly basis, with a follow up WAB performed at four and twelve weeks after baseline assessment. The trial was registered with Clinical trials registry India [2016/08/0120121]. All the statistical analysis was carried out with IBM SPSS Statistics for Windows version 21. 20 subjects [14 (70%) Males; Mean age: 52.8years±SD12.04] completed the study (9 in the less intensive and 11 in the more intensive arm). The mean four weeks follow up AQ showed a significant improvement from the baseline in the total group (p value: 0.01). The rate of rise of AQ from the baseline to four weeks follow up (ΔAQ %) showed a significantly greater value for the less intensive treatment group as against the more intensive treatment group [155% (SD: 150; 95% CI: 34-275) versus 52% (SD: 42%; 95% CI: 24-80) respectively: p value: 0.053]. Even though the more intensive treatment arm incorporating combined professional based SLT and computer software based training fared poorer than the less intensive therapy group, this study nevertheless reinforces the feasibility of SLT in augmenting recovery of early post stroke aphasia. Copyright © 2017 Elsevier B.V. All rights reserved.
Speech-driven environmental control systems--a qualitative analysis of users' perceptions.
Judge, Simon; Robertson, Zoë; Hawley, Mark; Enderby, Pam
2009-05-01
To explore users' experiences and perceptions of speech-driven environmental control systems (SPECS) as part of a larger project aiming to develop a new SPECS. The motivation for this part of the project was to add to the evidence base for the use of SPECS and to determine the key design specifications for a new speech-driven system from a user's perspective. Semi-structured interviews were conducted with 12 users of SPECS from around the United Kingdom. These interviews were transcribed and analysed using a qualitative method based on framework analysis. Reliability is the main influence on the use of SPECS. All the participants gave examples of occasions when their speech-driven system was unreliable; in some instances, this unreliability was reported as not being a problem (e.g., for changing television channels); however, it was perceived as a problem for more safety critical functions (e.g., opening a door). Reliability was cited by participants as the reason for using a switch-operated system as back up. Benefits of speech-driven systems focused on speech operation enabling access when other methods were not possible; quicker operation and better aesthetic considerations. Overall, there was a perception of increased independence from the use of speech-driven environmental control. In general, speech was considered a useful method of operating environmental controls by the participants interviewed; however, their perceptions regarding reliability often influenced their decision to have backup or alternative systems for certain functions.
Speech perception in autism spectrum disorder: An activation likelihood estimation meta-analysis.
Tryfon, Ana; Foster, Nicholas E V; Sharda, Megha; Hyde, Krista L
2018-02-15
Autism spectrum disorder (ASD) is often characterized by atypical language profiles and auditory and speech processing. These can contribute to aberrant language and social communication skills in ASD. The study of the neural basis of speech perception in ASD can serve as a potential neurobiological marker of ASD early on, but mixed results across studies renders it difficult to find a reliable neural characterization of speech processing in ASD. To this aim, the present study examined the functional neural basis of speech perception in ASD versus typical development (TD) using an activation likelihood estimation (ALE) meta-analysis of 18 qualifying studies. The present study included separate analyses for TD and ASD, which allowed us to examine patterns of within-group brain activation as well as both common and distinct patterns of brain activation across the ASD and TD groups. Overall, ASD and TD showed mostly common brain activation of speech processing in bilateral superior temporal gyrus (STG) and left inferior frontal gyrus (IFG). However, the results revealed trends for some distinct activation in the TD group showing additional activation in higher-order brain areas including left superior frontal gyrus (SFG), left medial frontal gyrus (MFG), and right IFG. These results provide a more reliable neural characterization of speech processing in ASD relative to previous single neuroimaging studies and motivate future work to investigate how these brain signatures relate to behavioral measures of speech processing in ASD. Copyright © 2017 Elsevier B.V. All rights reserved.
Speech outcome in unilateral complete cleft lip and palate patients: a descriptive study.
Rullo, R; Di Maggio, D; Addabbo, F; Rullo, F; Festa, V M; Perillo, L
2014-09-01
In this study, resonance and articulation disorders were examined in a group of patients surgically treated for cleft lip and palate, considering family social background, and children's ability of self monitoring their speech output while speaking. Fifty children (32 males and 18 females) mean age 6.5 ± 1.6 years, affected by non-syndromic complete unilateral cleft of the lip and palate underwent the same surgical protocol. The speech level was evaluated using the Accordi's speech assessment protocol that focuses on intelligibility, nasality, nasal air escape, pharyngeal friction, and glottal stop. Pearson product-moment correlation analysis was used to detect significant associations between analysed parameters. A total of 16% (8 children) of the sample had severe to moderate degree of nasality and nasal air escape, presence of pharyngeal friction and glottal stop, which obviously compromise speech intelligibility. Ten children (10%) showed a barely acceptable phonological outcome: nasality and nasal air escape were mild to moderate, but the intelligibility remained poor. Thirty-two children (64%) had normal speech. Statistical analysis revealed a significant correlation between the severity of nasal resonance and nasal air escape (p ≤ 0.05). No statistical significant correlation was found between the final intelligibility and the patient social background, neither between the final intelligibility nor the age of the patients. The differences in speech outcome could be explained with a specific, subjective, and inborn ability, different for each child, in self-monitoring their speech output.
The Pathways for Intelligible Speech: Multivariate and Univariate Perspectives
Evans, S.; Kyong, J.S.; Rosen, S.; Golestani, N.; Warren, J.E.; McGettigan, C.; Mourão-Miranda, J.; Wise, R.J.S.; Scott, S.K.
2014-01-01
An anterior pathway, concerned with extracting meaning from sound, has been identified in nonhuman primates. An analogous pathway has been suggested in humans, but controversy exists concerning the degree of lateralization and the precise location where responses to intelligible speech emerge. We have demonstrated that the left anterior superior temporal sulcus (STS) responds preferentially to intelligible speech (Scott SK, Blank CC, Rosen S, Wise RJS. 2000. Identification of a pathway for intelligible speech in the left temporal lobe. Brain. 123:2400–2406.). A functional magnetic resonance imaging study in Cerebral Cortex used equivalent stimuli and univariate and multivariate analyses to argue for the greater importance of bilateral posterior when compared with the left anterior STS in responding to intelligible speech (Okada K, Rong F, Venezia J, Matchin W, Hsieh IH, Saberi K, Serences JT,Hickok G. 2010. Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech. 20: 2486–2495.). Here, we also replicate our original study, demonstrating that the left anterior STS exhibits the strongest univariate response and, in decoding using the bilateral temporal cortex, contains the most informative voxels showing an increased response to intelligible speech. In contrast, in classifications using local “searchlights” and a whole brain analysis, we find greater classification accuracy in posterior rather than anterior temporal regions. Thus, we show that the precise nature of the multivariate analysis used will emphasize different response profiles associated with complex sound to speech processing. PMID:23585519
Armstrong, Linda; Stansfield, Jois; Bloch, Steven
2017-11-01
Following content analyses of the first 30 years of the UK speech and language therapy professional body's journal, this study was conducted to survey the published work of the speech (and language) therapy profession over the last 50 years and trace key changes and themes. To understand better the development of the UK speech and language therapy profession over the last 50 years. All volumes of the professional journal of the Royal College of Speech and Language Therapists published between 1966 and 2015 (British Journal of Communication Disorders, European Journal of Communication Disorders and International Journal of Language and Communication Disorders) were examined using content analysis. The content was compared with that of the same journal as it appeared from 1935 to 1965. The journal has shown a trend towards more multi-authored and international papers, and a formalization of research methodologies. The volume of papers has increased considerably. Topic areas have expanded, but retain many of the areas of study found in earlier issues of the journal. The journal and its articles reflect the growing complexity of conditions being researched by speech and language therapists and their professional colleagues and give an indication of the developing evidence base for intervention and the diverse routes which speech and language therapy practice has taken over the last 50 years. © 2017 Royal College of Speech and Language Therapists.
[Restoration of speech function in oncological patients with maxillary defects].
Matiakin, E G; Chuchkov, V M; Akhundov, A A; Azizian, R I; Romanov, I S; Chuchkov, M V; Agapov, V V
2009-01-01
Speech quality was evaluated in 188 patients with acquired maxillary defects. Prosthetic treatment of 29 patients was preceded by pharmacopsychotherapy. Sixty three patients had lessons with a logopedist and 66 practiced self-tuition based on the specially developed test. Thirty patients were examined for the quality of speech without preliminary preparation. Speech quality was assessed by auditory and spectral analysis. The main forms of impaired speech quality in the patients with maxillary defects were marked rhinophonia and impaired articulation. The proposed analytical tests were based on a combination of "difficult" vowels and consonants. The use of a removable prostheses with an obturator failed to correct the affected speech function but created prerequisites for the formation of the correct speech stereotype. Results of the study suggest the relationship between the quality of speech in subjects with maxillary defects and their intellectual faculties as well as the desire to overcome this drawback. The proposed tests are designed to activate the neuromuscular apparatus responsible for the generation of the speech. Lessons with a speech therapist give a powerful emotional incentive to the patients and promote their efforts toward restoration of speaking ability. Pharmacopsychotherapy and self-control are another efficacious tools for the improvement of speech quality in patients with maxillary defects.
Engaged listeners: shared neural processing of powerful political speeches.
Schmälzle, Ralf; Häcker, Frank E K; Honey, Christopher J; Hasson, Uri
2015-08-01
Powerful speeches can captivate audiences, whereas weaker speeches fail to engage their listeners. What is happening in the brains of a captivated audience? Here, we assess audience-wide functional brain dynamics during listening to speeches of varying rhetorical quality. The speeches were given by German politicians and evaluated as rhetorically powerful or weak. Listening to each of the speeches induced similar neural response time courses, as measured by inter-subject correlation analysis, in widespread brain regions involved in spoken language processing. Crucially, alignment of the time course across listeners was stronger for rhetorically powerful speeches, especially for bilateral regions of the superior temporal gyri and medial prefrontal cortex. Thus, during powerful speeches, listeners as a group are more coupled to each other, suggesting that powerful speeches are more potent in taking control of the listeners' brain responses. Weaker speeches were processed more heterogeneously, although they still prompted substantially correlated responses. These patterns of coupled neural responses bear resemblance to metaphors of resonance, which are often invoked in discussions of speech impact, and contribute to the literature on auditory attention under natural circumstances. Overall, this approach opens up possibilities for research on the neural mechanisms mediating the reception of entertaining or persuasive messages. © The Author (2015). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Comparing speech and nonspeech context effects across timescales in coarticulatory contexts.
Viswanathan, Navin; Kelty-Stephen, Damian G
2018-02-01
Context effects are ubiquitous in speech perception and reflect the ability of human listeners to successfully perceive highly variable speech signals. In the study of how listeners compensate for coarticulatory variability, past studies have used similar effects speech and tone analogues of speech as strong support for speech-neutral, general auditory mechanisms for compensation for coarticulation. In this manuscript, we revisit compensation for coarticulation by replacing standard button-press responses with mouse-tracking responses and examining both standard geometric measures of uncertainty as well as newer information-theoretic measures that separate fast from slow mouse movements. We found that when our analyses were restricted to end-state responses, tones and speech contexts appeared to produce similar effects. However, a more detailed time-course analysis revealed systematic differences between speech and tone contexts such that listeners' responses to speech contexts, but not to tone contexts, changed across the experimental session. Analyses of the time course of effects within trials using mouse tracking indicated that speech contexts elicited fewer x-position flips but more area under the curve (AUC) and maximum deviation (MD), and they did so in the slower portions of mouse-tracking movements. Our results indicate critical differences between the time course of speech and nonspeech context effects and that general auditory explanations, motivated by their apparent similarity, be reexamined.
Jørgensen, Søren; Dau, Torsten
2011-09-01
A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America
Speech and oromotor outcome in adolescents born preterm: relationship to motor tract integrity.
Northam, Gemma B; Liégeois, Frédérique; Chong, Wui K; Baker, Kate; Tournier, Jacques-Donald; Wyatt, John S; Baldeweg, Torsten; Morgan, Angela
2012-03-01
To assess speech abilities in adolescents born preterm and investigate whether there is an association between specific speech deficits and brain abnormalities. Fifty adolescents born prematurely (<33 weeks' gestation) with a spectrum of brain injuries were recruited (mean age, 16 years). Speech examination included tests of speech-sound processing and production and speech and oromotor control. Conventional magnetic resonance imaging and diffusion-weighted imaging was acquired in all adolescents born preterm and 30 term-born control subjects. Radiological ratings of brain injury were recorded and the integrity of the primary motor projections was measured (corticospinal tract and speech-motor corticobulbar tract [CST/CBT]). There were no clinical diagnoses of developmental dysarthria, dyspraxia, or a speech-sound disorder, but difficulties in speech and oromotor control were common. A regression analysis revealed that presence of a neurologic impairment, and diffusion-weighted imaging abnormalities in the left CST/CBT were significant independent predictors of poor speech and oromotor outcome. These left-lateralized abnormalities were most evident at the level of the posterior limb of the internal capsule. Difficulties in speech and oromotor control are common in adolescents born preterm, and adolescents with injury to the CST/CBT pathways in the left-hemisphere may be most at risk. Copyright © 2012 Mosby, Inc. All rights reserved.
Characterizing resonant component in speech: A different view of tracking fundamental frequency
NASA Astrophysics Data System (ADS)
Dong, Bin
2017-05-01
Inspired by the nonlinearity and nonstationarity and the modulations in speech, Hilbert-Huang Transform and cyclostationarity analysis are employed to investigate the speech resonance in vowel in sequence. Cyclostationarity analysis is not directly manipulated on the target vowel, but on its intrinsic mode functions one by one. Thanks to the equivalence between the fundamental frequency in speech and the cyclic frequency in cyclostationarity analysis, the modulation intensity distributions of the intrinsic mode functions provide much information for the estimation of the fundamental frequency. To highlight the relationship between frequency and time, the pseudo-Hilbert spectrum is proposed to replace the Hilbert spectrum here. After contrasting the pseudo-Hilbert spectra of and the modulation intensity distributions of the intrinsic mode functions, it finds that there is usually one intrinsic mode function which works as the fundamental component of the vowel. Furthermore, the fundamental frequency of the vowel can be determined by tracing the pseudo-Hilbert spectrum of its fundamental component along the time axis. The later method is more robust to estimate the fundamental frequency, when meeting nonlinear components. Two vowels [a] and [i], picked up from a speech database FAU Aibo Emotion Corpus, are applied to validate the above findings.
Iconic hand gestures and the predictability of words in context in spontaneous speech.
Beattie, G; Shovelton, H
2000-11-01
This study presents a series of empirical investigations to test a theory of speech production proposed by Butterworth and Hadar (1989; revised in Hadar & Butterworth, 1997) that iconic gestures have a functional role in lexical retrieval in spontaneous speech. Analysis 1 demonstrated that words which were totally unpredictable (as measured by the Shannon guessing technique) were more likely to occur after pauses than after fluent speech, in line with earlier findings. Analysis 2 demonstrated that iconic gestures were associated with words of lower transitional probability than words not associated with gesture, even when grammatical category was controlled. This therefore provided new supporting evidence for Butterworth and Hadar's claims that gestures' lexical affiliates are indeed unpredictable lexical items. However, Analysis 3 found that iconic gestures were not occasioned by lexical accessing difficulties because although gestures tended to occur with words of significantly lower transitional probability, these lower transitional probability words tended to be uttered quite fluently. Overall, therefore, this study provided little evidence for Butterworth and Hadar's theoretical claim that the main function of the iconic hand gestures that accompany spontaneous speech is to assist in the process of lexical access. Instead, such gestures are reconceptualized in terms of communicative function.
[Development and equivalence evaluation of spondee lists of mandarin speech test materials].
Zhang, Hua; Wang, Shuo; Wang, Liang; Chen, Jing; Chen, Ai-ting; Guo, Lian-sheng; Zhao, Xiao-yan; Ji, Chen
2006-06-01
To edit the spondee (disyllable) word lists as a part of mandarin speech test materials (MSTM). These will be basic speech materials for routine tests in clinics and laboratories. Two groups of professionals (audiologists, Chinese and Mandarin scientists, linguistician and statistician) were set up at first. The editing principles were established after 3 round table meetings. Ten spondee lists, each with 50 words, were edited and recorded into cassettes. All lists were phonemically balanced (3-dimensions: vowels, consonants and Chinese tones). Seventy-three normal hearing college students were tested. The speech was presented by earphone monaurally. Three statistic methods were used for equivalent analysis. Related analysis showed that all lists were much related, except List 5. Cluster analysis showed that all ten lists could be classified as two groups. But Kappa test showed that the lists' homogeneity were not well. Spondee lists are one of the most routine speech test materials. Their editing, recording and equivalent evaluation are affected by many factors. This also needs multi-discipline cooperation. All lists edited in present study need future modification in recording and testing in order to be used clinically and in research. The phonemic balance should be kept.
Analysis of human scream and its impact on text-independent speaker verification.
Hansen, John H L; Nandwana, Mahesh Kumar; Shokouhi, Navid
2017-04-01
Scream is defined as sustained, high-energy vocalizations that lack phonological structure. Lack of phonological structure is how scream is identified from other forms of loud vocalization, such as "yell." This study investigates the acoustic aspects of screams and addresses those that are known to prevent standard speaker identification systems from recognizing the identity of screaming speakers. It is well established that speaker variability due to changes in vocal effort and Lombard effect contribute to degraded performance in automatic speech systems (i.e., speech recognition, speaker identification, diarization, etc.). However, previous research in the general area of speaker variability has concentrated on human speech production, whereas less is known about non-speech vocalizations. The UT-NonSpeech corpus is developed here to investigate speaker verification from scream samples. This study considers a detailed analysis in terms of fundamental frequency, spectral peak shift, frame energy distribution, and spectral tilt. It is shown that traditional speaker recognition based on the Gaussian mixture models-universal background model framework is unreliable when evaluated with screams.
A comparative analysis of whispered and normally phonated speech using an LPC-10 vocoder
NASA Astrophysics Data System (ADS)
Wilson, J. B.; Mosko, J. D.
1985-12-01
The determination of the performance of an LPC-10 vocoder in the processing of adult male and female whispered and normally phonated connected speech was the focus of this study. The LPC-10 vocoder's analysis of whispered speech compared quite favorably with similar studies which used sound spectrographic processing techniques. Shifting from phonated speech to whispered speech caused a substantial increase in the phonomic formant frequencies and formant bandwidths for both male and female speakers. The data from this study showed no evidence that the LPC-10 vocoder's ability to process voices with pitch extremes and quality extremes was limited in any significant manner. A comparison of the unprocessed natural vowel waveforms and qualities with the synthesized vowel waveforms and qualities revealed almost imperceptible differences. An LPC-10 vocoder's ability to process linguistic and dialectical suprasegmental features such as intonation, rate and stress at low bit rates should be a critical issue of concern for future research.
Promoting lexical learning in the speech and language therapy of children with cochlear implants.
Ronkainen, Riitta; Laakso, Minna; Lonka, Eila; Tykkyläinen, Tuula
2017-01-01
This study examines lexical intervention sessions in speech and language therapy for children with cochlear implants (CIs). Particular focus is on the therapist's professional practices in doing the therapy. The participants in this study are three congenitally deaf children with CIs together with their speech and language therapist. The video recorded therapy sessions of these children are studied using conversation analysis. The analysis reveals the ways in which the speech and language therapist formulates her speaking turns to support the children's lexical learning in task interaction. The therapist's multimodal practices, for example linguistic and acoustic highlighting, focus both on the lexical meaning and the phonological form of the words. Using these means, the therapist expands the child's lexical networks, specifies and corrects the meaning of the target words, and models the correct phonological form of the words. The findings of this study are useful in providing information for clinicians and speech and language therapy students working with children who have CIs as well as for the children's parents.
Monson, Brian B; Lotto, Andrew J; Story, Brad H
2012-09-01
The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech.
Classification of speech dysfluencies using LPC based parameterization techniques.
Hariharan, M; Chee, Lim Sin; Ai, Ooi Chia; Yaacob, Sazali
2012-06-01
The goal of this paper is to discuss and compare three feature extraction methods: Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC) and Weighted Linear Prediction Cepstral Coefficients (WLPCC) for recognizing the stuttered events. Speech samples from the University College London Archive of Stuttered Speech (UCLASS) were used for our analysis. The stuttered events were identified through manual segmentation and were used for feature extraction. Two simple classifiers namely, k-nearest neighbour (kNN) and Linear Discriminant Analysis (LDA) were employed for speech dysfluencies classification. Conventional validation method was used for testing the reliability of the classifier results. The study on the effect of different frame length, percentage of overlapping, value of ã in a first order pre-emphasizer and different order p were discussed. The speech dysfluencies classification accuracy was found to be improved by applying statistical normalization before feature extraction. The experimental investigation elucidated LPC, LPCC and WLPCC features can be used for identifying the stuttered events and WLPCC features slightly outperforms LPCC features and LPC features.
Ward, Roslyn; Leitão, Suze; Strauss, Geoff
2014-08-01
This study evaluates perceptual changes in speech production accuracy in six children (3-11 years) with moderate-to-severe speech impairment associated with cerebral palsy before, during, and after participation in a motor-speech intervention program (Prompts for Restructuring Oral Muscular Phonetic Targets). An A1BCA2 single subject research design was implemented. Subsequent to the baseline phase (phase A1), phase B targeted each participant's first intervention priority on the PROMPT motor-speech hierarchy. Phase C then targeted one level higher. Weekly speech probes were administered, containing trained and untrained words at the two levels of intervention, plus an additional level that served as a control goal. The speech probes were analysed for motor-speech-movement-parameters and perceptual accuracy. Analysis of the speech probe data showed all participants recorded a statistically significant change. Between phases A1-B and B-C 6/6 and 4/6 participants, respectively, recorded a statistically significant increase in performance level on the motor speech movement patterns targeted during the training of that intervention. The preliminary data presented in this study make a contribution to providing evidence that supports the use of a treatment approach aligned with dynamic systems theory to improve the motor-speech movement patterns and speech production accuracy in children with cerebral palsy.
AIDS web sites face censorship under new rating schemes.
1997-08-22
The American Civil Liberties Union (ACLU) issued a position paper regarding the software industry's proposed rating standards that will block and rate information judged unsuitable for minors. Following the U.S. Supreme Court's overturning of the Communications Decency Act, a ruling that maintains a high level of free speech protection over the Internet, the software industry began examining mechanisms to rate online content. Legislators are considering criminal penalties for those who misrate a web page. These moves are seen as damaging to HIV/AIDS prevention and safe sex information web sites that utilize jargon, street language, and explicit diagrams to teach safe sex practices to a wide audience. It is noted that related ratings and censorships do not apply to print material.
ERIC Educational Resources Information Center
Millar, Diane C.; Light, Janice C.; Schlosser, Ralf W.
2006-01-01
Purpose: This article presents the results of a meta-analysis to determine the effect of augmentative and alternative communication (AAC) on the speech production of individuals with developmental disabilities. Method: A comprehensive search of the literature published between 1975 and 2003, which included data on speech production before, during,…
"Sort of" in British Women's and Men's Speech
ERIC Educational Resources Information Center
Miettinen, Hanna; Watson, Greg
2013-01-01
This paper (Note 1) examines the form sort of in British men and women's speech, and investigates whether there is a gender difference in the use of this form. We do so through corpus analysis of the British National Corpus (BNC). We contend there is no quantitative difference in the use of sort of in men and women's speech. Contrary to general…
[Parent's perspective on child rearing and corporal punishment].
Donoso, Miguir Terezinha Vieccelli; Ricas, Janete
2009-02-01
To describe parents' current perception of corporal punishment associated to child rearing and its practices. There were studied 31 family members whose children were warded due to child abuse complaints (12) and not warded (19) at a health care unit and a local social service unit in the city of Belo Horizonte (Southeastern Brazil) in 2006. Data was collected through semi-structured interviews and speech analysis was performed grouped by subjects and categories. ANALYSIS OF DISCOURSE: There was limitation of the respondents' speeches based on their production means. There was a diversity of conceptions on child rearing and its practices and corporal punishment was reported by all parents, even among those who expressed strong disapproval of this practice. Speeches were characterized by heterogeneity and polyphony with emphasis on the tradition speech, the religious speech and the popular scientific speech. Respondents did not express concepts of legal interdiction of corporal punishment or its excesses. The culture of corporal punishment of children is changing; tradition approving it has weakened and prohibition has been slowly adopted. Reinforcing legal actions against this practice can contribute to speed up the process to end corporal punishment of children.
Voice Interactive Analysis System Study. Final Report, August 28, 1978 through March 23, 1979.
ERIC Educational Resources Information Center
Harry, D. P.; And Others
The Voice Interactive Analysis System study continued research and development of the LISTEN real-time, minicomputer based connected speech recognition system, within NAVTRAEQUIPCEN'S program of developing automatic speech technology in support of training. An attempt was made to identify the most effective features detected by the TTI-500 model…
Computational neuroanatomy of speech production.
Hickok, Gregory
2012-01-05
Speech production has been studied predominantly from within two traditions, psycholinguistics and motor control. These traditions have rarely interacted, and the resulting chasm between these approaches seems to reflect a level of analysis difference: whereas motor control is concerned with lower-level articulatory control, psycholinguistics focuses on higher-level linguistic processing. However, closer examination of both approaches reveals a substantial convergence of ideas. The goal of this article is to integrate psycholinguistic and motor control approaches to speech production. The result of this synthesis is a neuroanatomically grounded, hierarchical state feedback control model of speech production.
Strom, Mark A; Silverberg, Jonathan I
2016-01-01
To determine if eczema is associated with an increased risk of a speech disorder. We analyzed data on 354,416 children and adolescents from 19 US population-based cohorts: the 2003-2004 and 2007-2008 National Survey of Children's Health and 1997-2013 National Health Interview Survey, each prospective, questionnaire-based cohorts. In multivariate survey logistic regression models adjusting for sociodemographics and comorbid allergic disease, eczema was significantly associated with higher odds of speech disorder in 12 of 19 cohorts (P < .05). The pooled prevalence of speech disorder in children with eczema was 4.7% (95% CI 4.5%-5.0%) compared with 2.2% (95% CI 2.2%-2.3%) in children without eczema. In pooled multivariate analysis, eczema was associated with increased odds of speech disorder (aOR [95% CI] 1.81 [1.57-2.05], P < .001). In a single study assessing eczema severity, mild (1.36 [1.02-1.81], P = .03) and severe eczema (3.56 [1.70-7.48], P < .001) were associated with higher odds of speech disorder. History of eczema was associated with moderate (2.35 [1.34-4.10], P = .003) and severe (2.28 [1.11-4.72], P = .03) speech disorder. Finally, significant interactions were found, such that children with both eczema and attention deficit disorder with or without hyperactivity or sleep disturbance had vastly increased risk of speech disorders than either by itself. Pediatric eczema may be associated with increased risk of speech disorder. Further, prospective studies are needed to characterize the exact nature of this association. Copyright © 2016 Elsevier Inc. All rights reserved.
Strom, Mark A.; Silverberg, Jonathan I.
2016-01-01
Objective To determine if eczema is associated with an increased risk of a speech disorder. Study design We analyzed data on 354 416 children and adolescents from 19 US population-based cohorts: the 2003–2004 and 2007–2008 National Survey of Children’s Health and 1997–2013 National Health Interview Survey, each prospective, questionnaire-based cohorts. Results In multivariate survey logistic regression models adjusting for sociodemographics and comorbid allergic disease, eczema was significantly associated with higher odds of speech disorder in 12 of 19 cohorts (P < .05). The pooled prevalence of speech disorder in children with eczema was 4.7% (95% CI 4.5%–5.0%) compared with 2.2% (95% CI 2.2%–2.3%) in children without eczema. In pooled multivariate analysis, eczema was associated with increased odds of speech disorder (aOR [95% CI] 1.81 [1.57–2.05], P < .001). In a single study assessing eczema severity, mild (1.36 [1.02–1.81], P = .03) and severe eczema (3.56 [1.70–7.48], P < .001) were associated with higher odds of speech disorder. History of eczema was associated with moderate (2.35 [1.34–4.10], P = .003) and severe (2.28 [1.11–4.72], P = .03) speech disorder. Finally, significant interactions were found, such that children with both eczema and attention deficit disorder with or without hyperactivity or sleep disturbance had vastly increased risk of speech disorders than either by itself. Conclusions Pediatric eczema may be associated with increased risk of speech disorder. Further, prospective studies are needed to characterize the exact nature of this association. PMID:26520915
Simonyan, Kristina; Fuertinger, Stefan
2015-04-01
Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. Copyright © 2015 the American Physiological Society.
Application of artifical intelligence principles to the analysis of "crazy" speech.
Garfield, D A; Rapp, C
1994-04-01
Artificial intelligence computer simulation methods can be used to investigate psychotic or "crazy" speech. Here, symbolic reasoning algorithms establish semantic networks that schematize speech. These semantic networks consist of two main structures: case frames and object taxonomies. Node-based reasoning rules apply to object taxonomies and pathway-based reasoning rules apply to case frames. Normal listeners may recognize speech as "crazy talk" based on violations of node- and pathway-based reasoning rules. In this article, three separate segments of schizophrenic speech illustrate violations of these rules. This artificial intelligence approach is compared and contrasted with other neurolinguistic approaches and is discussed as a conceptual link between neurobiological and psychodynamic understandings of psychopathology.
Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang Xiaojia; Mao Qirong; Zhan Yongzhao
There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions.more » The experiments show that this method can improve the recognition rate and the time of feature extraction.« less
A computerized procedure for teaching the relationship between graphic symbols and their referents.
Isaacson, Mick; Lloyd, Lyle L
2013-01-01
Many individuals with little or no functional speech communicate through graphic symbols. Communication is enhanced when the relationship between symbols and their referents are learned to such a degree that retrieval is effortless, resulting in fluent communication. Developing fluency is a time consuming endeavor for special educators and speech-language pathologists (SLPs). It would be beneficial for these professionals to have an automated procedure based on the most efficacious method for teaching the relationship between symbols and referent. Hence, this study investigated whether a procedure based on the generation effect would promote learning the association between symbols and their referents. Results show that referent generation produces the best long-term retention of this relationship. These findings provide evidence that software based on referent generation would provide special educators and SLPs with an efficacious automated procedure, requiring minimal direct supervision, to facilitate symbol/referent learning and the development of communicative fluency.
Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo
2009-04-01
The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained from the subtitles. Speech comprehension improved even for relatively low ASR accuracy levels; for example, participants obtained about 2 dB SNR audiovisual benefit for ASR accuracies around 74%. Delaying the presentation of the text reduced the benefit and increased the listening effort. Participants with relatively low unimodal speech comprehension obtained greater benefit from the subtitles than participants with better unimodal speech comprehension. We observed an age-related decline in the working-memory capacity of the listeners with normal hearing. A higher age and a lower working memory capacity were associated with increased effort required to use the subtitles to improve speech comprehension. Participants were able to use partly incorrect and delayed subtitles to increase their comprehension of speech in noise, regardless of age and hearing loss. This supports the further development and evaluation of an assistive listening system that displays automatically recognized speech to aid speech comprehension by listeners with hearing impairment.
Calculation of selective filters of a device for primary analysis of speech signals
NASA Astrophysics Data System (ADS)
Chudnovskii, L. S.; Ageev, V. M.
2014-07-01
The amplitude-frequency responses of filters for primary analysis of speech signals, which have a low quality factor and a high rolloff factor in the high-frequency range, are calculated using the linear theory of speech production and psychoacoustic measurement data. The frequency resolution of the filter system for a sinusoidal signal is 40-200 Hz. The modulation-frequency resolution of amplitude- and frequency-modulated signals is 3-6 Hz. The aforementioned features of the calculated filters are close to the amplitudefrequency responses of biological auditory systems at the level of the eighth nerve.
Central Presbycusis: A Review and Evaluation of the Evidence
Humes, Larry E.; Dubno, Judy R.; Gordon-Salant, Sandra; Lister, Jennifer J.; Cacace, Anthony T.; Cruickshanks, Karen J.; Gates, George A.; Wilson, Richard H.; Wingfield, Arthur
2018-01-01
Background The authors reviewed the evidence regarding the existence of age-related declines in central auditory processes and the consequences of any such declines for everyday communication. Purpose This report summarizes the review process and presents its findings. Data Collection and Analysis The authors reviewed 165 articles germane to central presbycusis. Of the 165 articles, 132 articles with a focus on human behavioral measures for either speech or nonspeech stimuli were selected for further analysis. Results For 76 smaller-scale studies of speech understanding in older adults reviewed, the following findings emerged: (1) the three most commonly studied behavioral measures were speech in competition, temporally distorted speech, and binaural speech perception (especially dichotic listening); (2) for speech in competition and temporally degraded speech, hearing loss proved to have a significant negative effect on performance in most of the laboratory studies; (3) significant negative effects of age, unconfounded by hearing loss, were observed in most of the studies of speech in competing speech, time-compressed speech, and binaural speech perception; and (4) the influence of cognitive processing on speech understanding has been examined much less frequently, but when included, significant positive associations with speech understanding were observed. For 36 smaller-scale studies of the perception of nonspeech stimuli by older adults reviewed, the following findings emerged: (1) the three most frequently studied behavioral measures were gap detection, temporal discrimination, and temporal-order discrimination or identification; (2) hearing loss was seldom a significant factor; and (3) negative effects of age were almost always observed. For 18 studies reviewed that made use of test batteries and medium-to-large sample sizes, the following findings emerged: (1) all studies included speech-based measures of auditory processing; (2) 4 of the 18 studies included nonspeech stimuli; (3) for the speech-based measures, monaural speech in a competing-speech background, dichotic speech, and monaural time-compressed speech were investigated most frequently; (4) the most frequently used tests were the Synthetic Sentence Identification (SSI) test with Ipsilateral Competing Message (ICM), the Dichotic Sentence Identification (DSI) test, and time-compressed speech; (5) many of these studies using speech-based measures reported significant effects of age, but most of these studies were confounded by declines in hearing, cognition, or both; (6) for nonspeech auditory-processing measures, the focus was on measures of temporal processing in all four studies; (7) effects of cognition on nonspeech measures of auditory processing have been studied less frequently, with mixed results, whereas the effects of hearing loss on performance were minimal due to judicious selection of stimuli; and (8) there is a paucity of observational studies using test batteries and longitudinal designs. Conclusions Based on this review of the scientific literature, there is insufficient evidence to confirm the existence of central presbycusis as an isolated entity. On the other hand, recent evidence has been accumulating in support of the existence of central presbycusis as a multifactorial condition that involves age- and/or disease-related changes in the auditory system and in the brain. Moreover, there is a clear need for additional research in this area. PMID:22967738
Acoustic analysis of trill sounds.
Dhananjaya, N; Yegnanarayana, B; Bhaskararao, Peri
2012-04-01
In this paper, the acoustic-phonetic characteristics of steady apical trills--trill sounds produced by the periodic vibration of the apex of the tongue--are studied. Signal processing methods, namely, zero-frequency filtering and zero-time liftering of speech signals, are used to analyze the excitation source and the resonance characteristics of the vocal tract system, respectively. Although it is natural to expect the effect of trilling on the resonances of the vocal tract system, it is interesting to note that trilling influences the glottal source of excitation as well. The excitation characteristics derived using zero-frequency filtering of speech signals are glottal epochs, strength of impulses at the glottal epochs, and instantaneous fundamental frequency of the glottal vibration. Analysis based on zero-time liftering of speech signals is used to study the dynamic resonance characteristics of vocal tract system during the production of trill sounds. Qualitative analysis of trill sounds in different vowel contexts, and the acoustic cues that may help spotting trills in continuous speech are discussed.
NASA Astrophysics Data System (ADS)
Trollinger, Valerie L.
This study investigated the relationship between acoustical measurement of singing accuracy in relationship to speech fundamental frequency, speech fundamental frequency range, age and gender in preschool-aged children. Seventy subjects from Southeastern Pennsylvania; the San Francisco Bay Area, California; and Terre Haute, Indiana, participated in the study. Speech frequency was measured by having the subjects participate in spontaneous and guided speech activities with the researcher, with 18 diverse samples extracted from each subject's recording for acoustical analysis for fundamental frequency in Hz with the CSpeech computer program. The fundamental frequencies were averaged together to derive a mean speech frequency score for each subject. Speech range was calculated by subtracting the lowest fundamental frequency produced from the highest fundamental frequency produced, resulting in a speech range measured in increments of Hz. Singing accuracy was measured by having the subjects each echo-sing six randomized patterns using the pitches Middle C, D, E, F♯, G and A (440), using the solfege syllables of Do and Re, which were recorded by a 5-year-old female model. For each subject, 18 samples of singing were recorded. All samples were analyzed by the CSpeech for fundamental frequency. For each subject, deviation scores in Hz were derived by calculating the difference between what the model sang in Hz and what the subject sang in response in Hz. Individual scores for each child consisted of an overall mean total deviation frequency, mean frequency deviations for each pattern, and mean frequency deviation for each pitch. Pearson correlations, MANOVA and ANOVA analyses, Multiple Regressions and Discriminant Analysis revealed the following findings: (1) moderate but significant (p < .001) relationships emerged between mean speech frequency and the ability to sing the pitches E, F♯, G and A in the study; (2) mean speech frequency also emerged as the strongest predictor of subjects' ability to sing the notes E and F♯; (3) mean speech frequency correlated moderately and significantly (p < .001) with sharpness and flatness of singing response accuracy in Hz; (4) speech range was the strongest predictor of singing accuracy for the pitches G and A in the study (p < .001); (5) gender emerged as a significant, but not the strongest, predictor for ability to sing the pitches in the study above C and D; (6) gender did not correlate with mean speech frequency and speech range; (7) age in months emerged as a low but significant predictor of ability to sing the lower notes (C and D) in the study; (8) age correlated significantly but negatively low (r = -.23, p < .05, two-tailed) with mean speech frequency; and (9) age did not emerge as a significant predictor of overall singing accuracy. Ancillary findings indicated that there were significant differences in singing accuracy based on geographic location by gender, and that siblings and fraternal twins in the study generally performed similarly. In addition, reliability for using the CSpeech for acoustical analysis revealed test/retest correlations of .99, with one exception at .94. Based on these results, suggestions were made concerning future research concerned with studying the use of voice in speech and how it may affect singing development, overall use in singing, and pitch-matching accuracy.
Saltuklaroglu, Tim; Kalinowski, Joseph; Robbins, Mary; Crawcour, Stephen; Bowers, Andrew
2009-01-01
Stuttering is prone to strike during speech initiation more so than at any other point in an utterance. The use of auditory feedback (AAF) has been found to produce robust decreases in the stuttering frequency by creating an electronic rendition of choral speech (i.e., speaking in unison). However, AAF requires users to self-initiate speech before it can go into effect and, therefore, it might not be as helpful as true choral speech during speech initiation. To examine how AAF and choral speech differentially enhance fluency during speech initiation and in subsequent portions of utterances. Ten participants who stuttered read passages without altered feedback (NAF), under four AAF conditions and under a true choral speech condition. Each condition was blocked into ten 10 s trials separated by 5 s intervals so each trial required 'cold' speech initiation. In the first analysis, comparisons of stuttering frequencies were made across conditions. A second, finer grain analysis involved examining stuttering frequencies on the initial syllable, the subsequent four syllables produced and the five syllables produced immediately after the midpoint of each trial. On average, AAF reduced stuttering by approximately 68% relative to the NAF condition. Stuttering frequencies on the initial syllables were considerably higher than on the other syllables analysed (0.45 and 0.34 for NAF and AAF conditions, respectively). After the first syllable was produced, stuttering frequencies dropped precipitously and remained stable. However, this drop in stuttering frequency was significantly greater (approximately 84%) in the AAF conditions than in the NAF condition (approximately 66%) with frequencies on the last nine syllables analysed averaging 0.15 and 0.05 for NAF and AAF conditions, respectively. In the true choral speech condition, stuttering was virtually (approximately 98%) eliminated across all utterances and all syllable positions. Altered auditory feedback effectively inhibits stuttering immediately after speech has been initiated. However, unlike a true choral signal, which is exogenously initiated and offers the most complete fluency enhancement, AAF requires speech to be initiated by the user and 'fed back' before it can directly inhibit stuttering. It is suggested that AAF can be a viable clinical option for those who stutter and should often be used in combination with therapeutic techniques, particularly those that aid speech initiation. The substantially higher rate of stuttering occurring on initiation supports a hypothesis that overt stuttering events help 'release' and 'inhibit' central stuttering blocks. This perspective is examined in the context of internal models and mirror neurons.
Bowers, Andrew; Saltuklaroglu, Tim; Harkrider, Ashley; Cuellar, Megan
2013-01-01
Background Constructivist theories propose that articulatory hypotheses about incoming phonetic targets may function to enhance perception by limiting the possibilities for sensory analysis. To provide evidence for this proposal, it is necessary to map ongoing, high-temporal resolution changes in sensorimotor activity (i.e., the sensorimotor μ rhythm) to accurate speech and non-speech discrimination performance (i.e., correct trials.) Methods Sixteen participants (15 female and 1 male) were asked to passively listen to or actively identify speech and tone-sweeps in a two-force choice discrimination task while the electroencephalograph (EEG) was recorded from 32 channels. The stimuli were presented at signal-to-noise ratios (SNRs) in which discrimination accuracy was high (i.e., 80–100%) and low SNRs producing discrimination performance at chance. EEG data were decomposed using independent component analysis and clustered across participants using principle component methods in EEGLAB. Results ICA revealed left and right sensorimotor µ components for 14/16 and 13/16 participants respectively that were identified on the basis of scalp topography, spectral peaks, and localization to the precentral and postcentral gyri. Time-frequency analysis of left and right lateralized µ component clusters revealed significant (pFDR<.05) suppression in the traditional beta frequency range (13–30 Hz) prior to, during, and following syllable discrimination trials. No significant differences from baseline were found for passive tasks. Tone conditions produced right µ beta suppression following stimulus onset only. For the left µ, significant differences in the magnitude of beta suppression were found for correct speech discrimination trials relative to chance trials following stimulus offset. Conclusions Findings are consistent with constructivist, internal model theories proposing that early forward motor models generate predictions about likely phonemic units that are then synthesized with incoming sensory cues during active as opposed to passive processing. Future directions and possible translational value for clinical populations in which sensorimotor integration may play a functional role are discussed. PMID:23991030
Bowers, Andrew; Saltuklaroglu, Tim; Harkrider, Ashley; Cuellar, Megan
2013-01-01
Constructivist theories propose that articulatory hypotheses about incoming phonetic targets may function to enhance perception by limiting the possibilities for sensory analysis. To provide evidence for this proposal, it is necessary to map ongoing, high-temporal resolution changes in sensorimotor activity (i.e., the sensorimotor μ rhythm) to accurate speech and non-speech discrimination performance (i.e., correct trials.). Sixteen participants (15 female and 1 male) were asked to passively listen to or actively identify speech and tone-sweeps in a two-force choice discrimination task while the electroencephalograph (EEG) was recorded from 32 channels. The stimuli were presented at signal-to-noise ratios (SNRs) in which discrimination accuracy was high (i.e., 80-100%) and low SNRs producing discrimination performance at chance. EEG data were decomposed using independent component analysis and clustered across participants using principle component methods in EEGLAB. ICA revealed left and right sensorimotor µ components for 14/16 and 13/16 participants respectively that were identified on the basis of scalp topography, spectral peaks, and localization to the precentral and postcentral gyri. Time-frequency analysis of left and right lateralized µ component clusters revealed significant (pFDR<.05) suppression in the traditional beta frequency range (13-30 Hz) prior to, during, and following syllable discrimination trials. No significant differences from baseline were found for passive tasks. Tone conditions produced right µ beta suppression following stimulus onset only. For the left µ, significant differences in the magnitude of beta suppression were found for correct speech discrimination trials relative to chance trials following stimulus offset. Findings are consistent with constructivist, internal model theories proposing that early forward motor models generate predictions about likely phonemic units that are then synthesized with incoming sensory cues during active as opposed to passive processing. Future directions and possible translational value for clinical populations in which sensorimotor integration may play a functional role are discussed.
Reactivity to a Spouse's Interpersonal Suffering in Late Life Marriage: A Mixed-Methods Approach.
Mitchell, Hannah-Rose; Levy, Becca R; Keene, Danya E; Monin, Joan K
2015-09-01
To determine how older adult spouses react to their partners' interpersonal suffering. Spouses of individuals with musculoskeletal pain were recorded describing their partners' suffering while their blood pressure (BP) was monitored. After the account, spouses described their distress. Speeches were transcribed and analyzed with Linguistic Inquiry and Word Count software and coded for interpersonal content. Multivariate regression analyses were conducted with interpersonal content variables predicting BP and distress. Exploratory qualitative analysis was conducted using ATLAS.ti to explore mechanisms behind quantitative results. Describing partners' suffering as interpersonal and using social (family) words were associated with higher systolic BP reactivity. Husbands were more likely to describe partners' suffering as interpersonal. Qualitative results suggested shared stressors and bereavement-related distress as potential mechanisms for heightened reactivity to interpersonal suffering. Spouses' interpersonal suffering may negatively affect both men and women's cardiovascular health, and older husbands may be particularly affected. © The Author(s) 2015.
Reactivity to a Spouse's Interpersonal Suffering in Late Life Marriage: A Mixed-Methods Approach
Mitchell, Hannah-Rose; Levy, Becca R.; Keene, Danya E.; Monin, Joan K.
2015-01-01
Objective To determine how older adult spouses react to their partners' interpersonal suffering. Method Spouses of individuals with musculoskeletal pain were recorded describing their partners' suffering while their blood pressure (BP) was monitored. After the account, spouses described their distress. Speeches were transcribed and analyzed with Linguistic Inquiry and Word Count software and coded for interpersonal content. Multivariate regression analyses were conducted with interpersonal content variables predicting BP and distress. Exploratory qualitative analysis was conducted using ATLAS.ti to explore mechanisms behind quantitative results. Results Describing partners' suffering as interpersonal and using social (family) words were associated with higher systolic BP reactivity. Husbands were more likely to describe partners' suffering as interpersonal. Qualitative results suggested shared stressors and bereavement-related distress as potential mechanisms for heightened reactivity to interpersonal suffering. Discussion Spouses' interpersonal suffering may negatively affect both men and women's cardiovascular health, and older husbands may be particularly affected. PMID:25659746
Hustad, Katherine C.; Gorton, Kristin; Lee, Jimin
2010-01-01
Purpose Little is known about the speech and language abilities of children with cerebral palsy (CP) and there is currently no system for classifying speech and language profiles. Such a system would have epidemiological value and would have the potential to advance the development of interventions that improve outcomes. In this study, we propose and test a preliminary speech and language classification system by quantifying how well speech and language data differentiate among children classified into different hypothesized profile groups. Method Speech and language assessment data were collected in a laboratory setting from 34 children with CP (18 males; 16 females) who were a mean age of 54 months (SD 1.8 months). Measures of interest were vowel area, speech rate, language comprehension scores, and speech intelligibility ratings. Results Canonical discriminant function analysis showed that three functions accounted for 100% of the variance among profile groups, with speech variables accounting for 93% of the variance. Classification agreement varied from 74% to 97% using four different classification paradigms. Conclusions Results provide preliminary support for the classification of speech and language abilities of children with CP into four initial profile groups. Further research is necessary to validate the full classification system. PMID:20643795
McCormack, Jane; McLeod, Sharynne; McAllister, Lindy; Harrison, Linda J
2010-10-01
The purpose of this article was to understand the experience of speech impairment (speech sound disorders) in everyday life as described by children with speech impairment and their communication partners. Interviews were undertaken with 13 preschool children with speech impairment (mild to severe) and 21 significant others (family members and teachers). A phenomenological analysis of the interview transcripts revealed 2 global themes regarding the experience of living with speech impairment for these children and their families. The first theme encompassed the problems experienced by participants, namely (a) the child's inability to "speak properly," (b) the communication partner's failure to "listen properly," and (c) frustration caused by the speaking and listening problems. The second theme described the solutions participants used to overcome the problems. Solutions included (a) strategies to improve the child's speech accuracy (e.g., home practice, speech-language pathology) and (b) strategies to improve the listener's understanding (e.g., using gestures, repetition). Both short- and long-term solutions were identified. Successful communication is dependent on the skills of speakers and listeners. Intervention with children who experience speech impairment needs to reflect this reciprocity by supporting both the speaker and the listener and by addressing the frustration they experience.
Koelewijn, Thomas; Zekveld, Adriana A; Festen, Joost M; Kramer, Sophia E
2014-03-01
A recent pupillometry study on adults with normal hearing indicates that the pupil response during speech perception (cognitive processing load) is strongly affected by the type of speech masker. The current study extends these results by recording the pupil response in 32 participants with hearing impairment (mean age 59 yr) while they were listening to sentences masked by fluctuating noise or a single-talker. Efforts were made to improve audibility of all sounds by means of spectral shaping. Additionally, participants performed tests measuring verbal working memory capacity, inhibition of interfering information in working memory, and linguistic closure. The results showed worse speech reception thresholds for speech masked by single-talker speech compared to fluctuating noise. In line with previous results for participants with normal hearing, the pupil response was larger when listening to speech masked by a single-talker compared to fluctuating noise. Regression analysis revealed that larger working memory capacity and better inhibition of interfering information related to better speech reception thresholds, but these variables did not account for inter-individual differences in the pupil response. In conclusion, people with hearing impairment show more cognitive load during speech processing when there is interfering speech compared to fluctuating noise.
Alderson-Day, Ben; Fernyhough, Charles
2015-01-01
Inner speech is often reported to be a common and central part of inner experience, but its true prevalence is unclear. Many questionnaire-based measures appear to lack convergent validity and it has been claimed that they overestimate inner speech in comparison to experience sampling methods (which involve collecting data at random timepoints). The present study compared self-reporting of inner speech collected via a general questionnaire and experience sampling, using data from a custom-made smartphone app (Inner Life). Fifty-one university students completed a generalized self-report measure of inner speech (the Varieties of Inner Speech Questionnaire, VISQ) and responded to at least seven random alerts to report on incidences of inner speech over a 2-week period. Correlations and pairwise comparisons were used to compare generalized endorsements and randomly sampled scores for each VISQ subscale. Significant correlations were observed between general and randomly sampled measures for only two of the four VISQ subscales, and endorsements of inner speech with evaluative or motivational characteristics did not correlate at all across different measures. Endorsement of inner speech items was significantly lower for random sampling compared to generalized self-report, for all VISQ subscales. Exploratory analysis indicated that specific inner speech characteristics were also related to anxiety and future-oriented thinking. PMID:25964773
Ertmer, David J.; Jung, Jongmin
2012-01-01
Background Evidence of auditory-guided speech development can be heard as the prelinguistic vocalizations of young cochlear implant recipients become increasingly complex, phonetically diverse, and speech-like. In research settings, these changes are most often documented by collecting and analyzing speech samples. Sampling, however, may be too time-consuming and impractical for widespread use in clinical settings. The Conditioned Assessment of Speech Production (CASP; Ertmer & Stoel-Gammon, 2008) is an easily administered and time-efficient alternative to speech sample analysis. The current investigation examined the concurrent validity of the CASP and data obtained from speech samples recorded at the same intervals. Methods Nineteen deaf children who received CIs before their third birthdays participated in the study. Speech samples and CASP scores were gathered at 6, 12, 18, and 24 months post-activation. Correlation analyses were conducted to assess the concurrent validity of CASP scores and data from samples. Results CASP scores showed strong concurrent validity with scores from speech samples gathered across all recording sessions (6 – 24 months). Conclusions The CASP was found to be a valid, reliable, and time-efficient tool for assessing progress in vocal development during young CI recipient’s first 2 years of device experience. PMID:22628109
Schall, Sonja; von Kriegstein, Katharina
2014-01-01
It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers' voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker's face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.
Improved Speech Coding Based on Open-Loop Parameter Estimation
NASA Technical Reports Server (NTRS)
Juang, Jer-Nan; Chen, Ya-Chin; Longman, Richard W.
2000-01-01
A nonlinear optimization algorithm for linear predictive speech coding was developed early that not only optimizes the linear model coefficients for the open loop predictor, but does the optimization including the effects of quantization of the transmitted residual. It also simultaneously optimizes the quantization levels used for each speech segment. In this paper, we present an improved method for initialization of this nonlinear algorithm, and demonstrate substantial improvements in performance. In addition, the new procedure produces monotonically improving speech quality with increasing numbers of bits used in the transmitted error residual. Examples of speech encoding and decoding are given for 8 speech segments and signal to noise levels as high as 47 dB are produced. As in typical linear predictive coding, the optimization is done on the open loop speech analysis model. Here we demonstrate that minimizing the error of the closed loop speech reconstruction, instead of the simpler open loop optimization, is likely to produce negligible improvement in speech quality. The examples suggest that the algorithm here is close to giving the best performance obtainable from a linear model, for the chosen order with the chosen number of bits for the codebook.
Obstructive sleep apnea severity estimation: Fusion of speech-based systems.
Ben Or, D; Dafna, E; Tarasiuk, A; Zigel, Y
2016-08-01
Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder. Previous studies associated OSA with anatomical abnormalities of the upper respiratory tract that may be reflected in the acoustic characteristics of speech. We tested the hypothesis that the speech signal carries essential information that can assist in early assessment of OSA severity by estimating apnea-hypopnea index (AHI). 198 men referred to routine polysomnography (PSG) were recorded shortly prior to sleep onset while reading a one-minute speech protocol. The different parts of the speech recordings, i.e., sustained vowels, short-time frames of fluent speech, and the speech recording as a whole, underwent separate analyses, using sustained vowels features, short-term features, and long-term features, respectively. Applying support vector regression and regression trees, these features were used in order to estimate AHI. The fusion of the outputs of the three subsystems resulted in a diagnostic agreement of 67.3% between the speech-estimated AHI and the PSG-determined AHI, and an absolute error rate of 10.8 events/hr. Speech signal analysis may assist in the estimation of AHI, thus allowing the development of a noninvasive tool for OSA screening.
Speech and language development in 2-year-old children with cerebral palsy.
Hustad, Katherine C; Allison, Kristen; McFadd, Emily; Riehle, Katherine
2014-06-01
We examined early speech and language development in children who had cerebral palsy. Questions addressed whether children could be classified into early profile groups on the basis of speech and language skills and whether there were differences on selected speech and language measures among groups. Speech and language assessments were completed on 27 children with CP who were between the ages of 24 and 30 months (mean age 27.1 months; SD 1.8). We examined several measures of expressive and receptive language, along with speech intelligibility. Two-step cluster analysis was used to identify homogeneous groups of children based on their performance on the seven dependent variables characterizing speech and language performance. Three groups of children identified were those not yet talking (44% of the sample); those whose talking abilities appeared to be emerging (41% of the sample); and those who were established talkers (15% of the sample). Group differences were evident on all variables except receptive language skills. 85% of 2-year-old children with CP in this study had clinical speech and/or language delays relative to age expectations. Findings suggest that children with CP should receive speech and language assessment and treatment at or before 2 years of age.
Mefferd, Antje S.
2016-01-01
The degree of speech movement pattern consistency can provide information about speech motor control. Although tongue motor control is particularly important because of the tongue's primary contribution to the speech acoustic signal, capturing tongue movements during speech remains difficult and costly. This study sought to determine if formant movements could be used to estimate tongue movement pattern consistency indirectly. Two age groups (seven young adults and seven older adults) and six speech conditions (typical, slow, loud, clear, fast, bite block speech) were selected to elicit an age- and task-dependent performance range in tongue movement pattern consistency. Kinematic and acoustic spatiotemporal indexes (STI) were calculated based on sentence-length tongue movement and formant movement signals, respectively. Kinematic and acoustic STI values showed strong associations across talkers and moderate to strong associations for each talker across speech tasks; although, in cases where task-related tongue motor performance changes were relatively small, the acoustic STI values were poorly associated with kinematic STI values. These findings suggest that, depending on the sensitivity needs, formant movement pattern consistency could be used in lieu of direct kinematic analysis to indirectly examine speech motor control. PMID:27908069
Influence of speech sample on perceptual rating of hypernasality.
Medeiros, Maria Natália Leite de; Fukushiro, Ana Paula; Yamashita, Renata Paciello
2016-07-07
To investigate the influence of speech sample of spontaneous conversation or sentences repetition on intra and inter-rater hypernasality reliability. One hundred and twenty audio recorded speech samples (60 containing spontaneous conversation and 60 containing repeated sentences) of individuals with repaired cleft palate±lip, both genders, aged between 6 and 52 years old (mean=21±10) were selected and edited. Three experienced speech and language pathologists rated hypernasality according to their own criteria using 4-point scale: 1=absence of hypernasality, 2=mild hypernasality, 3=moderate hypernasality and 4=severe hypernasality, first in spontaneous speech samples and 30 days after, in sentences repetition samples. Intra- and inter-rater agreements were calculated for both speech samples and were statistically compared by the Z test at a significance level of 5%. Comparison of intra-rater agreements between both speech samples showed an increase of the coefficients obtained in the analysis of sentences repetition compared to those obtained in spontaneous conversation. Comparison between inter-rater agreement showed no significant difference among the three raters for the two speech samples. Sentences repetition improved intra-raters reliability of perceptual judgment of hypernasality. However, the speech sample had no influence on reliability among different raters.
An Analysis of the Ethical Groundwork of Franklyn Haiman's "Speech and Law in a Free Society."
ERIC Educational Resources Information Center
Andersen, Kenneth E.
A quick reading of Franklyn Haiman's writings on ethics and free speech would suggest many disparities in his early conception of the ethical communicator and in his conception of free speech and the activity it allows in a democracy. In the material on ethics, Haiman addresses the ideal of how people ought to communicate with others in an ethical…
Speech Rate as a Sticky Switch: A Multiple Lesion Case Analysis of Mutism and Hyperlalia
ERIC Educational Resources Information Center
Braun, Claude M. J.; Dumont, Mathieu; Duval, Julie; Hamel-Hebert, Isabelle
2004-01-01
Though it has long been known on the basis of clinical associations and serendipitous observation that speech rate is related to mood and psychomotor baseline, it is less known that speech rate is also related to libido and to immune function. We make the case for a bipolar phenomenon of ''psychic tonus,'' encompassing all these dimensions. The…
Brainstem Encoding of Aided Speech in Hearing Aid Users with Cochlear Dead Region(s).
Hassaan, Mohammad Ramadan; Ibraheem, Ola Abdallah; Galhom, Dalia Helal
2016-07-01
Neural encoding of speech begins with the analysis of the signal as a whole broken down into its sinusoidal components in the cochlea, which has to be conserved up to the higher auditory centers. Some of these components target the dead regions of the cochlea causing little or no excitation. Measuring aided speech-evoked auditory brainstem response elicited by speech stimuli with different spectral maxima can give insight into the brainstem encoding of aided speech with spectral maxima at these dead regions. This research aims to study the impact of dead regions of the cochlea on speech processing at the brainstem level after a long period of hearing aid use. This study comprised 30 ears without dead regions and 46 ears with dead regions at low, mid, or high frequencies. For all ears, we measured the aided speech-evoked auditory brainstem response using speech stimuli of low, mid, and high spectral maxima. Aided speech-evoked auditory brainstem response was producible in all subjects. Responses evoked by stimuli with spectral maxima at dead regions had longer latencies and smaller amplitudes when compared with the control group or the responses of other stimuli. The presence of cochlear dead regions affects brainstem encoding of speech with spectral maxima perpendicular to these regions. Brainstem neuroplasticity and the extrinsic redundancy of speech can minimize the impact of dead regions in chronic hearing aid users.
Identifying Residual Speech Sound Disorders in Bilingual Children: A Japanese-English Case Study
Preston, Jonathan L.; Seki, Ayumi
2012-01-01
Purpose The purposes are to (1) describe the assessment of residual speech sound disorders (SSD) in bilinguals by distinguishing speech patterns associated with second language acquisition from patterns associated with misarticulations, and (2) describe how assessment of domains such as speech motor control and phonological awareness can provide a more complete understanding of SSDs in bilinguals. Method A review of Japanese phonology is provided to offer a context for understanding the transfer of Japanese to English productions. A case study of an 11-year-old is presented, demonstrating parallel speech assessments in English and Japanese. Speech motor and phonological awareness tasks were conducted in both languages. Results Several patterns were observed in the participant’s English that could be plausibly explained by the influence of Japanese phonology. However, errors indicating a residual SSD were observed in both Japanese and English. A speech motor assessment suggested possible speech motor control problems, and phonological awareness was judged to be within the typical range of performance in both languages. Conclusion Understanding the phonological characteristics of L1 can help clinicians recognize speech patterns in L2 associated with transfer. Once these differences are understood, patterns associated with a residual SSD can be identified. Supplementing a relational speech analysis with measures of speech motor control and phonological awareness can provide a more comprehensive understanding of a client’s strengths and needs. PMID:21386046
Differentiating primary progressive aphasias in a brief sample of connected speech
Evans, Emily; O'Shea, Jessica; Powers, John; Boller, Ashley; Weinberg, Danielle; Haley, Jenna; McMillan, Corey; Irwin, David J.; Rascovsky, Katya; Grossman, Murray
2013-01-01
Objective: A brief speech expression protocol that can be administered and scored without special training would aid in the differential diagnosis of the 3 principal forms of primary progressive aphasia (PPA): nonfluent/agrammatic PPA, logopenic variant PPA, and semantic variant PPA. Methods: We used a picture-description task to elicit a short speech sample, and we evaluated impairments in speech-sound production, speech rate, lexical retrieval, and grammaticality. We compared the results with those obtained by a longer, previously validated protocol and further validated performance with multimodal imaging to assess the neuroanatomical basis of the deficits. Results: We found different patterns of impaired grammar in each PPA variant, and additional language production features were impaired in each: nonfluent/agrammatic PPA was characterized by speech-sound errors; logopenic variant PPA by dysfluencies (false starts and hesitations); and semantic variant PPA by poor retrieval of nouns. Strong correlations were found between this brief speech sample and a lengthier narrative speech sample. A composite measure of grammaticality and other measures of speech production were correlated with distinct regions of gray matter atrophy and reduced white matter fractional anisotropy in each PPA variant. Conclusions: These findings provide evidence that large-scale networks are required for fluent, grammatical expression; that these networks can be selectively disrupted in PPA syndromes; and that quantitative analysis of a brief speech sample can reveal the corresponding distinct speech characteristics. PMID:23794681
[The ideal body: media pedagogy].
Ribeiro, Rubia Guimarães; da Silva, Karen Schein; Kruse, Maria Henriqueta Luce
2009-03-01
We present enunciations that circulate in the media regarding the body, discussing the ways in which the speeches related with the maintenance of health and aesthetics invest in its improvement. Therefore, we used the Caderno Vida, a weekly insert of Zero Hora, for we understand it as owner of a proper speech that has the power of subjectivate people The analysis is part of Cultural Studies and it is based on the ideas of Michel Foucault. The methodological strategy used was the speech analysis of subjects about body care. The periodical questions its readers using speeches that point to beauty health and success The constructed categories were: how is the ideal body, what to do to have such body and why we must have this body Balanced feeding, practice of regular physical activities and the accomplishment of plastic surgeries are recommendations recurrently found in weekly inserts.
Estimating psycho-physiological state of a human by speech analysis
NASA Astrophysics Data System (ADS)
Ronzhin, A. L.
2005-05-01
Adverse effects of intoxication, fatigue and boredom could degrade performance of highly trained operators of complex technical systems with potentially catastrophic consequences. Existing physiological fitness for duty tests are time consuming, costly, invasive, and highly unpopular. Known non-physiological tests constitute a secondary task and interfere with the busy workload of the tested operator. Various attempts to assess the current status of the operator by processing of "normal operational data" often lead to excessive amount of computations, poorly justified metrics, and ambiguity of results. At the same time, speech analysis presents a natural, non-invasive approach based upon well-established efficient data processing. In addition, it supports both behavioral and physiological biometric. This paper presents an approach facilitating robust speech analysis/understanding process in spite of natural speech variability and background noise. Automatic speech recognition is suggested as a technique for the detection of changes in the psycho-physiological state of a human that typically manifest themselves by changes of characteristics of voice tract and semantic-syntactic connectivity of conversation. Preliminary tests have confirmed that the statistically significant correlation between the error rate of automatic speech recognition and the extent of alcohol intoxication does exist. In addition, the obtained data allowed exploring some interesting correlations and establishing some quantitative models. It is proposed to utilize this approach as a part of fitness for duty test and compare its efficiency with analyses of iris, face geometry, thermography and other popular non-invasive biometric techniques.
The effect of filtered speech feedback on the frequency of stuttering
NASA Astrophysics Data System (ADS)
Rami, Manish Krishnakant
2000-10-01
This study investigated the effects of filtered components of speech and whispered speech on the frequency of stuttering. It is known that choral speech, shadowing, and altered auditory feedback are the only conditions which induce fluency without any additional effort than normally required to speak on the part of people who stutter. All these conditions use speech as a second signal. This experiment examined the role of components of speech signal as delineated by the source- filter theory of speech production. Three filtered speech signals, a whispered speech signal, and a choral speech signal formed the stimuli. It was postulated that if the speech signal in whole was necessary for producing fluency in people who stutter, then all other conditions except choral speech should fail to produce fluency enhancement. If the glottal source alone was adequate in restoring fluency, then only the conditions of NAF and whispered speech should fail in promoting fluency. In the event that full filter characteristics are necessary for the fluency creating effects, then all conditions except the choral speech and whispered speech should fail to produce fluency. If any part of the filter characteristics is sufficient in yielding fluency, then only the NAF and the approximate glottal source should fail to demonstrate an increase in the amount of fluency. Twelve adults who stuttered read passages under the six conditions while receiving auditory feedback consisting of one of the six experimental conditions: (a)NAF; (b)approximate glottal source; (c)glottal source and first formant; (d)glottal source and first two formants; and (e)whispered speech. Frequencies of stuttering were obtained for each condition and submitted to descriptive and inferential statistical analysis. Statistically significant differences in means were found within the choral feedback conditions. Specifically, the choral speech, the source and first formant, source and the first two formants, and the whispered speech conditions all decreased the frequency of stuttering while the approximate glottal source did not. It is suggested that articulatory events, chiefly the encoded speech output of the vocal tract origin, afford effective cues and induces fluent speech in people who stutter.
Coppens-Hofman, Marjolein C.; Terband, Hayo; Snik, Ad F.M.; Maassen, Ben A.M.
2017-01-01
Purpose Adults with intellectual disabilities (ID) often show reduced speech intelligibility, which affects their social interaction skills. This study aims to establish the main predictors of this reduced intelligibility in order to ultimately optimise management. Method Spontaneous speech and picture naming tasks were recorded in 36 adults with mild or moderate ID. Twenty-five naïve listeners rated the intelligibility of the spontaneous speech samples. Performance on the picture-naming task was analysed by means of a phonological error analysis based on expert transcriptions. Results The transcription analyses showed that the phonemic and syllabic inventories of the speakers were complete. However, multiple errors at the phonemic and syllabic level were found. The frequencies of specific types of errors were related to intelligibility and quality ratings. Conclusions The development of the phonemic and syllabic repertoire appears to be completed in adults with mild-to-moderate ID. The charted speech difficulties can be interpreted to indicate speech motor control and planning difficulties. These findings may aid the development of diagnostic tests and speech therapies aimed at improving speech intelligibility in this specific group. PMID:28118637
Ethical dilemmas experienced by speech-language pathologists working in private practice.
Flatley, Danielle R; Kenny, Belinda J; Lincoln, Michelle A
2014-06-01
Speech-language pathologists experience ethical dilemmas as they fulfil their professional roles and responsibilities. Previous research findings indicated that speech-language pathologists working in publicly funded settings identified ethical dilemmas when they managed complex clients, negotiated professional relationships, and addressed service delivery issues. However, little is known about ethical dilemmas experienced by speech-language pathologists working in private practice settings. The aim of this qualitative study was to describe the nature of ethical dilemmas experienced by speech-language pathologists working in private practice. Data were collected through semi-structured interviews with 10 speech-language pathologists employed in diverse private practice settings. Participants explained the nature of ethical dilemmas they experienced at work and identified their most challenging and frequently occurring ethical conflicts. Qualitative content analysis was used to analyse transcribed data and generate themes. Four themes reflected the nature of speech-language pathologists' ethical dilemmas; balancing benefit and harm, fidelity of business practices, distributing funds, and personal and professional integrity. Findings support the need for professional development activities that are specifically targeted towards facilitating ethical practice for speech-language pathologists in the private sector.
Lu, Huanhuan; Wang, Fuzhong; Zhang, Huichun
2016-04-01
Traditional speech detection methods regard the noise as a jamming signal to filter,but under the strong noise background,these methods lost part of the original speech signal while eliminating noise.Stochastic resonance can use noise energy to amplify the weak signal and suppress the noise.According to stochastic resonance theory,a new method based on adaptive stochastic resonance to extract weak speech signals is proposed.This method,combined with twice sampling,realizes the detection of weak speech signals from strong noise.The parameters of the systema,b are adjusted adaptively by evaluating the signal-to-noise ratio of the output signal,and then the weak speech signal is optimally detected.Experimental simulation analysis showed that under the background of strong noise,the output signal-to-noise ratio increased from the initial value-7dB to about 0.86 dB,with the gain of signalto-noise ratio is 7.86 dB.This method obviously raises the signal-to-noise ratio of the output speech signals,which gives a new idea to detect the weak speech signals in strong noise environment.
a Comparative Analysis of Fluent and Cerebral Palsied Speech.
NASA Astrophysics Data System (ADS)
van Doorn, Janis Lee
Several features of the acoustic waveforms of fluent and cerebral palsied speech were compared, using six fluent and seven cerebral palsied subjects, with a major emphasis being placed on an investigation of the trajectories of the first three formants (vocal tract resonances). To provide an overall picture which included other acoustic features, fundamental frequency, intensity, speech timing (speech rate and syllable duration), and prevocalization (vocalization prior to initial stop consonants found in cerebral palsied speech) were also investigated. Measurements were made using repetitions of a test sentence which was chosen because it required large excursions of the speech articulators (lips, tongue and jaw), so that differences in the formant trajectories for the fluent and cerebral palsied speakers would be emphasized. The acoustic features were all extracted from the digitized speech waveform (10 kHz sampling rate): the fundamental frequency contours were derived manually, the intensity contours were measured using the signal covariance, speech rate and syllable durations were measured manually, as were the prevocalization durations, while the formant trajectories were derived from short time spectra which were calculated for each 10 ms of speech using linear prediction analysis. Differences which were found in the acoustic features can be summarized as follows. For cerebral palsied speakers, the fundamental frequency contours generally showed inappropriate exaggerated fluctuations, as did some of the intensity contours; the mean fundamental frequencies were either higher or the same as for the fluent subjects; speech rates were reduced, and syllable durations were longer; prevocalization was consistently present at the beginning of the test sentence; formant trajectories were found to have overall reduced frequency ranges, and to contain anomalous transitional features, but it is noteworthy that for any one cerebral palsied subject, the inappropriate trajectory pattern was generally reproducible. The anomalous transitional features took the form of (a) inappropriate transition patterns, (b) reduced frequency excursions, (c) increased transition durations, and (d) decreased maximum rates of frequency change.