Greene, Beth G; Logan, John S; Pisoni, David B
1986-03-01
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.
Recording high quality speech during tagged cine-MRI studies using a fiber optic microphone.
NessAiver, Moriel S; Stone, Maureen; Parthasarathy, Vijay; Kahana, Yuvi; Paritsky, Alexander; Paritsky, Alex
2006-01-01
To investigate the feasibility of obtaining high quality speech recordings during cine imaging of tongue movement using a fiber optic microphone. A Complementary Spatial Modulation of Magnetization (C-SPAMM) tagged cine sequence triggered by an electrocardiogram (ECG) simulator was used to image a volunteer while speaking the syllable pairs /a/-/u/, /i/-/u/, and the words "golly" and "Tamil" in sync with the imaging sequence. A noise-canceling, optical microphone was fastened approximately 1-2 inches above the mouth of the volunteer. The microphone was attached via optical fiber to a laptop computer, where the speech was sampled at 44.1 kHz. A reference recording of gradient activity with no speech was subtracted from target recordings. Good quality speech was discernible above the background gradient sound using the fiber optic microphone without reference subtraction. The audio waveform of gradient activity was extremely stable and reproducible. Subtraction of the reference gradient recording further reduced gradient noise by roughly 21 dB, resulting in exceptionally high quality speech waveforms. It is possible to obtain high quality speech recordings using an optical microphone even during exceptionally loud cine imaging sequences. This opens up the possibility of more elaborate MRI studies of speech including spectral analysis of the speech signal in all types of MRI.
GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.
2012-01-01
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916
Kates, James M; Arehart, Kathryn H
2015-10-01
This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships.
Kates, James M.; Arehart, Kathryn H.
2015-01-01
This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships. PMID:26520329
The development and validation of the speech quality instrument.
Chen, Stephanie Y; Griffin, Brianna M; Mancuso, Dean; Shiau, Stephanie; DiMattia, Michelle; Cellum, Ilana; Harvey Boyd, Kelly; Prevoteau, Charlotte; Kohlberg, Gavriel D; Spitzer, Jaclyn B; Lalwani, Anil K
2017-12-08
Although speech perception tests are available to evaluate hearing, there is no standardized validated tool to quantify speech quality. The objective of this study is to develop a validated tool to measure quality of speech heard. Prospective instrument validation study of 35 normal hearing adults recruited at a tertiary referral center. Participants listened to 44 speech clips of male/female voices reciting the Rainbow Passage. Speech clips included original and manipulated excerpts capturing goal qualities such as mechanical and garbled. Listeners rated clips on a 10-point visual analog scale (VAS) of 18 characteristics (e.g. cartoonish, garbled). Skewed distribution analysis identified mean ratings in the upper and lower 2-point limits of the VAS (ratings of 8-10, 0-2, respectively); items with inconsistent responses were eliminated. The test was pruned to a final instrument of nine speech clips that clearly define qualities of interest: speech-like, male/female, cartoonish, echo-y, garbled, tinny, mechanical, rough, breathy, soothing, hoarse, like, pleasant, natural. Mean ratings were highest for original female clips (8.8) and lowest for not-speech manipulation (2.1). Factor analysis identified two subsets of characteristics: internal consistency demonstrated Cronbach's alpha of 0.95 and 0.82 per subset. Test-retest reliability of total scores was high, with an intraclass correlation coefficient of 0.76. The Speech Quality Instrument (SQI) is a concise, valid tool for assessing speech quality as an indicator for hearing performance. SQI may be a valuable outcome measure for cochlear implant recipients who, despite achieving excellent speech perception, often experience poor speech quality. 2b. Laryngoscope, 2017. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
An integrated approach to improving noisy speech perception
NASA Astrophysics Data System (ADS)
Koval, Serguei; Stolbov, Mikhail; Smirnova, Natalia; Khitrov, Mikhail
2002-05-01
For a number of practical purposes and tasks, experts have to decode speech recordings of very poor quality. A combination of techniques is proposed to improve intelligibility and quality of distorted speech messages and thus facilitate their comprehension. Along with the application of noise cancellation and speech signal enhancement techniques removing and/or reducing various kinds of distortions and interference (primarily unmasking and normalization in time and frequency fields), the approach incorporates optimal listener expert tactics based on selective listening, nonstandard binaural listening, accounting for short-term and long-term human ear adaptation to noisy speech, as well as some methods of speech signal enhancement to support speech decoding during listening. The approach integrating the suggested techniques ensures high-quality ultimate results and has successfully been applied by Speech Technology Center experts and by numerous other users, mainly forensic institutions, to perform noisy speech records decoding for courts, law enforcement and emergency services, accident investigation bodies, etc.
The persuasiveness of synthetic speech versus human speech.
Stern, S E; Mullennix, J W; Dyson, C; Wilson, S J
1999-12-01
Is computer-synthesized speech as persuasive as the human voice when presenting an argument? After completing an attitude pretest, 193 participants were randomly assigned to listen to a persuasive appeal under three conditions: a high-quality synthesized speech system (DECtalk Express), a low-quality synthesized speech system (Monologue), and a tape recording of a human voice. Following the appeal, participants completed a posttest attitude survey and a series of questionnaires designed to assess perceptions of speech qualities, perceptions of the speaker, and perceptions of the message. The human voice was generally perceived more favorably than the computer-synthesized voice, and the speaker was perceived more favorably when the voice was a human voice than when it was computer synthesized. There was, however, no evidence that computerized speech, as compared with the human voice, affected persuasion or perceptions of the message. Actual or potential applications of this research include issues that should be considered when designing synthetic speech systems.
Nonlinear frequency compression: effects on sound quality ratings of speech and music.
Parsa, Vijay; Scollie, Susan; Glista, Danielle; Seelisch, Andreas
2013-03-01
Frequency lowering technologies offer an alternative amplification solution for severe to profound high frequency hearing losses. While frequency lowering technologies may improve audibility of high frequency sounds, the very nature of this processing can affect the perceived sound quality. This article reports the results from two studies that investigated the impact of a nonlinear frequency compression (NFC) algorithm on perceived sound quality. In the first study, the cutoff frequency and compression ratio parameters of the NFC algorithm were varied, and their effect on the speech quality was measured subjectively with 12 normal hearing adults, 12 normal hearing children, 13 hearing impaired adults, and 9 hearing impaired children. In the second study, 12 normal hearing and 8 hearing impaired adult listeners rated the quality of speech in quiet, speech in noise, and music after processing with a different set of NFC parameters. Results showed that the cutoff frequency parameter had more impact on sound quality ratings than the compression ratio, and that the hearing impaired adults were more tolerant to increased frequency compression than normal hearing adults. No statistically significant differences were found in the sound quality ratings of speech-in-noise and music stimuli processed through various NFC settings by hearing impaired listeners. These findings suggest that there may be an acceptable range of NFC settings for hearing impaired individuals where sound quality is not adversely affected. These results may assist an Audiologist in clinical NFC hearing aid fittings for achieving a balance between high frequency audibility and sound quality.
A Comparison of LBG and ADPCM Speech Compression Techniques
NASA Astrophysics Data System (ADS)
Bachu, Rajesh G.; Patel, Jignasa; Barkana, Buket D.
Speech compression is the technology of converting human speech into an efficiently encoded representation that can later be decoded to produce a close approximation of the original signal. In all speech there is a degree of predictability and speech coding techniques exploit this to reduce bit rates yet still maintain a suitable level of quality. This paper is a study and implementation of Linde-Buzo-Gray Algorithm (LBG) and Adaptive Differential Pulse Code Modulation (ADPCM) algorithms to compress speech signals. In here we implemented the methods using MATLAB 7.0. The methods we used in this study gave good results and performance in compressing the speech and listening tests showed that efficient and high quality coding is achieved.
Vector Adaptive/Predictive Encoding Of Speech
NASA Technical Reports Server (NTRS)
Chen, Juin-Hwey; Gersho, Allen
1989-01-01
Vector adaptive/predictive technique for digital encoding of speech signals yields decoded speech of very good quality after transmission at coding rate of 9.6 kb/s and of reasonably good quality at 4.8 kb/s. Requires 3 to 4 million multiplications and additions per second. Combines advantages of adaptive/predictive coding, and code-excited linear prediction, yielding speech of high quality but requires 600 million multiplications and additions per second at encoding rate of 4.8 kb/s. Vector adaptive/predictive coding technique bridges gaps in performance and complexity between adaptive/predictive coding and code-excited linear prediction.
A novel radar sensor for the non-contact detection of speech signals.
Jiao, Mingke; Lu, Guohua; Jing, Xijing; Li, Sheng; Li, Yanfeng; Wang, Jianqi
2010-01-01
Different speech detection sensors have been developed over the years but they are limited by the loss of high frequency speech energy, and have restricted non-contact detection due to the lack of penetrability. This paper proposes a novel millimeter microwave radar sensor to detect speech signals. The utilization of a high operating frequency and a superheterodyne receiver contributes to the high sensitivity of the radar sensor for small sound vibrations. In addition, the penetrability of microwaves allows the novel sensor to detect speech signals through nonmetal barriers. Results show that the novel sensor can detect high frequency speech energies and that the speech quality is comparable to traditional microphone speech. Moreover, the novel sensor can detect speech signals through a nonmetal material of a certain thickness between the sensor and the subject. Thus, the novel speech sensor expands traditional speech detection techniques and provides an exciting alternative for broader application prospects.
A Novel Radar Sensor for the Non-Contact Detection of Speech Signals
Jiao, Mingke; Lu, Guohua; Jing, Xijing; Li, Sheng; Li, Yanfeng; Wang, Jianqi
2010-01-01
Different speech detection sensors have been developed over the years but they are limited by the loss of high frequency speech energy, and have restricted non-contact detection due to the lack of penetrability. This paper proposes a novel millimeter microwave radar sensor to detect speech signals. The utilization of a high operating frequency and a superheterodyne receiver contributes to the high sensitivity of the radar sensor for small sound vibrations. In addition, the penetrability of microwaves allows the novel sensor to detect speech signals through nonmetal barriers. Results show that the novel sensor can detect high frequency speech energies and that the speech quality is comparable to traditional microphone speech. Moreover, the novel sensor can detect speech signals through a nonmetal material of a certain thickness between the sensor and the subject. Thus, the novel speech sensor expands traditional speech detection techniques and provides an exciting alternative for broader application prospects. PMID:22399895
NASA Technical Reports Server (NTRS)
Kondoz, A. M.; Evans, B. G.
1993-01-01
In the last decade, low bit rate speech coding research has received much attention resulting in newly developed, good quality, speech coders operating at as low as 4.8 Kb/s. Although speech quality at around 8 Kb/s is acceptable for a wide variety of applications, at 4.8 Kb/s more improvements in quality are necessary to make it acceptable to the majority of applications and users. In addition to the required low bit rate with acceptable speech quality, other facilities such as integrated digital echo cancellation and voice activity detection are now becoming necessary to provide a cost effective and compact solution. In this paper we describe a CELP speech coder with integrated echo canceller and a voice activity detector all of which have been implemented on a single DSP32C with 32 KBytes of SRAM. The quality of CELP coded speech has been improved significantly by a new codebook implementation which also simplifies the encoder/decoder complexity making room for the integration of a 64-tap echo canceller together with a voice activity detector.
NASA Astrophysics Data System (ADS)
Nakagawa, Seiji; Fujiyuki, Chika; Kagomiya, Takayuki
2013-07-01
Bone-conducted ultrasound (BCU) is perceived even by the profoundly sensorineural deaf. A novel hearing aid using the perception of amplitude-modulated BCU (BCU hearing aid: BCUHA) has been developed. However, there is room for improvement particularly in terms of sound quality. BCU speech is accompanied by a strong high-pitched tone and contain some distortion. In this study, the sound quality of BCU speech with several types of amplitude modulation [double-sideband with transmitted carrier (DSB-TC), double-sideband with suppressed carrier (DSB-SC), and transposed modulations] and air-conducted (AC) speech was quantitatively evaluated using semantic differential and factor analysis. The results showed that all the types of BCU speech had higher metallic and lower esthetic factor scores than AC speech. On the other hand, transposed speech was closer than the other types of BCU speech to AC speech generally; the transposed speech showed a higher powerfulness factor score than the other types of BCU speech and a higher esthetic factor score than DSB-SC speech. These results provide useful information for further development of the BCUHA.
Design of a robust baseband LPC coder for speech transmission over 9.6 kbit/s noisy channels
NASA Astrophysics Data System (ADS)
Viswanathan, V. R.; Russell, W. H.; Higgins, A. L.
1982-04-01
This paper describes the design of a baseband Linear Predictive Coder (LPC) which transmits speech over 9.6 kbit/sec synchronous channels with random bit errors of up to 1%. Presented are the results of our investigation of a number of aspects of the baseband LPC coder with the goal of maximizing the quality of the transmitted speech. Important among these aspects are: bandwidth of the baseband, coding of the baseband residual, high-frequency regeneration, and error protection of important transmission parameters. The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder. This optimized speech coding algorithm has been implemented as a real-time full-duplex system on an array processor. Informal listening tests of the real-time coder have shown that the coder produces good speech quality in the absence of channel bit errors and introduces only a slight degradation in quality for channel bit error rates of up to 1%.
Speech perception and quality of life of open-fit hearing aid users
GARCIA, Tatiana Manfrini; JACOB, Regina Tangerino de Souza; MONDELLI, Maria Fernanda Capoani Garcia
2016-01-01
ABSTRACT Objective To relate the performance of individuals with hearing loss at high frequencies in speech perception with the quality of life before and after the fitting of an open-fit hearing aid (HA). Methods The WHOQOL-BREF had been used before the fitting and 90 days after the use of HA. The Hearing in Noise Test (HINT) had been conducted in two phases: (1) at the time of fitting without an HA (situation A) and with an HA (situation B); (2) with an HA 90 days after fitting (situation C). Study Sample Thirty subjects with sensorineural hearing loss at high frequencies. Results By using an analysis of variance and the Tukey’s test comparing the three HINT situations in quiet and noisy environments, an improvement has been observed after the HA fitting. The results of the WHOQOL-BREF have showed an improvement in the quality of life after the HA fitting (paired t-test). The relationship between speech perception and quality of life before the HA fitting indicated a significant relationship between speech recognition in noisy environments and in the domain of social relations after the HA fitting (Pearson’s correlation coefficient). Conclusions The auditory stimulation has improved speech perception and the quality of life of individuals. PMID:27383708
Isaacson, M D; Srinivasan, S; Lloyd, L L
2010-01-01
MathSpeak is a set of rules for non speaking of mathematical expressions. These rules have been incorporated into a computerised module that translates printed mathematics into the non-ambiguous MathSpeak form for synthetic speech rendering. Differences between individual utterances produced with the translator module are difficult to discern because of insufficient pausing between utterances; hence, the purpose of this study was to develop an algorithm for improving the synthetic speech rendering of MathSpeak. To improve synthetic speech renderings, an algorithm for inserting pauses was developed based upon recordings of middle and high school math teachers speaking mathematic expressions. Efficacy testing of this algorithm was conducted with college students without disabilities and high school/college students with visual impairments. Parameters measured included reception accuracy, short-term memory retention, MathSpeak processing capacity and various rankings concerning the quality of synthetic speech renderings. All parameters measured showed statistically significant improvements when the algorithm was used. The algorithm improves the quality and information processing capacity of synthetic speech renderings of MathSpeak. This increases the capacity of individuals with print disabilities to perform mathematical activities and to successfully fulfill science, technology, engineering and mathematics academic and career objectives.
Pulse Vector-Excitation Speech Encoder
NASA Technical Reports Server (NTRS)
Davidson, Grant; Gersho, Allen
1989-01-01
Proposed pulse vector-excitation speech encoder (PVXC) encodes analog speech signals into digital representation for transmission or storage at rates below 5 kilobits per second. Produces high quality of reconstructed speech, but with less computation than required by comparable speech-encoding systems. Has some characteristics of multipulse linear predictive coding (MPLPC) and of code-excited linear prediction (CELP). System uses mathematical model of vocal tract in conjunction with set of excitation vectors and perceptually-based error criterion to synthesize natural-sounding speech.
Rating, ranking, and understanding acoustical quality in university classrooms
NASA Astrophysics Data System (ADS)
Hodgson, Murray
2002-08-01
Nonoptimal classroom acoustical conditions directly affect speech perception and, thus, learning by students. Moreover, they may lead to voice problems for the instructor, who is forced to raise his/her voice when lecturing to compensate for poor acoustical conditions. The project applied previously developed simplified methods to predict speech intelligibility in occupied classrooms from measurements in unoccupied and occupied university classrooms. The methods were used to predict the speech intelligibility at various positions in 279 University of British Columbia (UBC) classrooms, when 70% occupied, and for four instructor voice levels. Classrooms were classified and rank ordered by acoustical quality, as determined by the room-average speech intelligibility. This information was used by UBC to prioritize classrooms for renovation. Here, the statistical results are reported to illustrate the range of acoustical qualities found at a typical university. Moreover, the variations of quality with relevant classroom acoustical parameters were studied to better understand the results. In particular, the factors leading to the best and worst conditions were studied. It was found that 81% of the 279 classrooms have "good," "very good," or "excellent" acoustical quality with a "typical" (average-male) instructor. However, 50 (18%) of the classrooms had "fair" or "poor" quality, and two had "bad" quality, due to high ventilation-noise levels. Most rooms were "very good" or "excellent" at the front, and "good" or "very good" at the back. Speech quality varied strongly with the instructor voice level. In the worst case considered, with a quiet female instructor, most of the classrooms were "bad" or "poor." Quality also varies with occupancy, with decreased occupancy resulting in decreased quality. The research showed that a new classroom acoustical design and renovation should focus on limiting background noise. They should promote high instructor speech levels at the back of the classrooms. This involves, in part, limiting the amount of sound absorption that is introduced into classrooms to control reverberation. Speech quality is not very sensitive to changes in reverberation, so controlling it for its own sake should not be a design priority. copyright 2002 Acoustical Society of America.
Telephone-quality pathological speech classification using empirical mode decomposition.
Kaleem, M F; Ghoraani, B; Guergachi, A; Krishnan, S
2011-01-01
This paper presents a computationally simple and effective methodology based on empirical mode decomposition (EMD) for classification of telephone quality normal and pathological speech signals. EMD is used to decompose continuous normal and pathological speech signals into intrinsic mode functions, which are analyzed to extract physically meaningful and unique temporal and spectral features. Using continuous speech samples from a database of 51 normal and 161 pathological speakers, which has been modified to simulate telephone quality speech under different levels of noise, a linear classifier is used with the feature vector thus obtained to obtain a high classification accuracy, thereby demonstrating the effectiveness of the methodology. The classification accuracy reported in this paper (89.7% for signal-to-noise ratio 30 dB) is a significant improvement over previously reported results for the same task, and demonstrates the utility of our methodology for cost-effective remote voice pathology assessment over telephone channels.
Integrating cognitive and peripheral factors in predicting hearing-aid processing effectiveness
Kates, James M.; Arehart, Kathryn H.; Souza, Pamela E.
2013-01-01
Individual factors beyond the audiogram, such as age and cognitive abilities, can influence speech intelligibility and speech quality judgments. This paper develops a neural network framework for combining multiple subject factors into a single model that predicts speech intelligibility and quality for a nonlinear hearing-aid processing strategy. The nonlinear processing approach used in the paper is frequency compression, which is intended to improve the audibility of high-frequency speech sounds by shifting them to lower frequency regions where listeners with high-frequency loss have better hearing thresholds. An ensemble averaging approach is used for the neural network to avoid the problems associated with overfitting. Models are developed for two subject groups, one having nearly normal hearing and the other mild-to-moderate sloping losses. PMID:25669257
Implementation of Three Text to Speech Systems for Kurdish Language
NASA Astrophysics Data System (ADS)
Bahrampour, Anvar; Barkhoda, Wafa; Azami, Bahram Zahir
Nowadays, concatenative method is used in most modern TTS systems to produce artificial speech. The most important challenge in this method is choosing appropriate unit for creating database. This unit must warranty smoothness and high quality speech, and also, creating database for it must reasonable and inexpensive. For example, syllable, phoneme, allophone, and, diphone are appropriate units for all-purpose systems. In this paper, we implemented three synthesis systems for Kurdish language based on syllable, allophone, and diphone and compare their quality using subjective testing.
Parent-child interaction in motor speech therapy.
Namasivayam, Aravind Kumar; Jethava, Vibhuti; Pukonen, Margit; Huynh, Anna; Goshulak, Debra; Kroll, Robert; van Lieshout, Pascal
2018-01-01
This study measures the reliability and sensitivity of a modified Parent-Child Interaction Observation scale (PCIOs) used to monitor the quality of parent-child interaction. The scale is part of a home-training program employed with direct motor speech intervention for children with speech sound disorders. Eighty-four preschool age children with speech sound disorders were provided either high- (2×/week/10 weeks) or low-intensity (1×/week/10 weeks) motor speech intervention. Clinicians completed the PCIOs at the beginning, middle, and end of treatment. Inter-rater reliability (Kappa scores) was determined by an independent speech-language pathologist who assessed videotaped sessions at the midpoint of the treatment block. Intervention sensitivity of the scale was evaluated using a Friedman test for each item and then followed up with Wilcoxon pairwise comparisons where appropriate. We obtained fair-to-good inter-rater reliability (Kappa = 0.33-0.64) for the PCIOs using only video-based scoring. Child-related items were more strongly influenced by differences in treatment intensity than parent-related items, where a greater number of sessions positively influenced parent learning of treatment skills and child behaviors. The adapted PCIOs is reliable and sensitive to monitor the quality of parent-child interactions in a 10-week block of motor speech intervention with adjunct home therapy. Implications for rehabilitation Parent-centered therapy is considered a cost effective method of speech and language service delivery. However, parent-centered models may be difficult to implement for treatments such as developmental motor speech interventions that require a high degree of skill and training. For children with speech sound disorders and motor speech difficulties, a translated and adapted version of the parent-child observation scale was found to be sufficiently reliable and sensitive to assess changes in the quality of the parent-child interactions during intervention. In developmental motor speech interventions, high-intensity treatment (2×/week/10 weeks) facilitates greater changes in the parent-child interactions than low intensity treatment (1×/week/10 weeks). On one hand, parents may need to attend more than five sessions with the clinician to learn how to observe and address their child's speech difficulties. On the other hand, children with speech sound disorders may need more than 10 sessions to adapt to structured play settings even when activities and therapy materials are age-appropriate.
ERIC Educational Resources Information Center
Whitmire, Kathleen A.; Rivers, Kenyatta O.; Mele-McCarthy, Joan A.; Staskowski, Maureen
2014-01-01
Speech-language pathologists are faced with demands for evidence to support practice. Federal legislation requires high-quality evidence for decisions regarding school-based services as part of evidence-based practice. The purpose of this article is to discuss the limited scientific evidence for making appropriate decisions about speech-language…
Wang, Yulin; Tian, Xuelong
2014-08-01
In order to improve the speech quality and auditory perceptiveness of electronic cochlear implant under strong noise background, a speech enhancement system used for electronic cochlear implant front-end was constructed. Taking digital signal processing (DSP) as the core, the system combines its multi-channel buffered serial port (McBSP) data transmission channel with extended audio interface chip TLV320AIC10, so speech signal acquisition and output with high speed are realized. Meanwhile, due to the traditional speech enhancement method which has the problems as bad adaptability, slow convergence speed and big steady-state error, versiera function and de-correlation principle were used to improve the existing adaptive filtering algorithm, which effectively enhanced the quality of voice communications. Test results verified the stability of the system and the de-noising performance of the algorithm, and it also proved that they could provide clearer speech signals for the deaf or tinnitus patients.
Reference-free automatic quality assessment of tracheoesophageal speech.
Huang, Andy; Falk, Tiago H; Chan, Wai-Yip; Parsa, Vijay; Doyle, Philip
2009-01-01
Evaluation of the quality of tracheoesophageal (TE) speech using machines instead of human experts can enhance the voice rehabilitation process for patients who have undergone total laryngectomy and voice restoration. Towards the goal of devising a reference-free TE speech quality estimation algorithm, we investigate the efficacy of speech signal features that are used in standard telephone-speech quality assessment algorithms, in conjunction with a recently introduced speech modulation spectrum measure. Tests performed on two TE speech databases demonstrate that the modulation spectral measure and a subset of features in the standard ITU-T P.563 algorithm estimate TE speech quality with better correlation (up to 0.9) than previously proposed features.
Objective speech quality evaluation of real-time speech coders
NASA Astrophysics Data System (ADS)
Viswanathan, V. R.; Russell, W. H.; Huggins, A. W. F.
1984-02-01
This report describes the work performed in two areas: subjective testing of a real-time 16 kbit/s adaptive predictive coder (APC) and objective speech quality evaluation of real-time coders. The speech intelligibility of the APC coder was tested using the Diagnostic Rhyme Test (DRT), and the speech quality was tested using the Diagnostic Acceptability Measure (DAM) test, under eight operating conditions involving channel error, acoustic background noise, and tandem link with two other coders. The test results showed that the DRT and DAM scores of the APC coder equalled or exceeded the corresponding test scores fo the 32 kbit/s CVSD coder. In the area of objective speech quality evaluation, the report describes the development, testing, and validation of a procedure for automatically computing several objective speech quality measures, given only the tape-recordings of the input speech and the corresponding output speech of a real-time speech coder.
Relationship Among Signal Fidelity, Hearing Loss, and Working Memory for Digital Noise Suppression.
Arehart, Kathryn; Souza, Pamela; Kates, James; Lunner, Thomas; Pedersen, Michael Syskind
2015-01-01
This study considered speech modified by additive babble combined with noise-suppression processing. The purpose was to determine the relative importance of the signal modifications, individual peripheral hearing loss, and individual cognitive capacity on speech intelligibility and speech quality. The participant group consisted of 31 individuals with moderate high-frequency hearing loss ranging in age from 51 to 89 years (mean = 69.6 years). Speech intelligibility and speech quality were measured using low-context sentences presented in babble at several signal-to-noise ratios. Speech stimuli were processed with a binary mask noise-suppression strategy with systematic manipulations of two parameters (error rate and attenuation values). The cumulative effects of signal modification produced by babble and signal processing were quantified using an envelope-distortion metric. Working memory capacity was assessed with a reading span test. Analysis of variance was used to determine the effects of signal processing parameters on perceptual scores. Hierarchical linear modeling was used to determine the role of degree of hearing loss and working memory capacity in individual listener response to the processed noisy speech. The model also considered improvements in envelope fidelity caused by the binary mask and the degradations to envelope caused by error and noise. The participants showed significant benefits in terms of intelligibility scores and quality ratings for noisy speech processed by the ideal binary mask noise-suppression strategy. This benefit was observed across a range of signal-to-noise ratios and persisted when up to a 30% error rate was introduced into the processing. Average intelligibility scores and average quality ratings were well predicted by an objective metric of envelope fidelity. Degree of hearing loss and working memory capacity were significant factors in explaining individual listener's intelligibility scores for binary mask processing applied to speech in babble. Degree of hearing loss and working memory capacity did not predict listeners' quality ratings. The results indicate that envelope fidelity is a primary factor in determining the combined effects of noise and binary mask processing for intelligibility and quality of speech presented in babble noise. Degree of hearing loss and working memory capacity are significant factors in explaining variability in listeners' speech intelligibility scores but not in quality ratings.
Keshavarzi, Mahmoud; Goehring, Tobias; Zakis, Justin; Turner, Richard E; Moore, Brian C J
2018-01-01
Despite great advances in hearing-aid technology, users still experience problems with noise in windy environments. The potential benefits of using a deep recurrent neural network (RNN) for reducing wind noise were assessed. The RNN was trained using recordings of the output of the two microphones of a behind-the-ear hearing aid in response to male and female speech at various azimuths in the presence of noise produced by wind from various azimuths with a velocity of 3 m/s, using the "clean" speech as a reference. A paired-comparison procedure was used to compare all possible combinations of three conditions for subjective intelligibility and for sound quality or comfort. The conditions were unprocessed noisy speech, noisy speech processed using the RNN, and noisy speech that was high-pass filtered (which also reduced wind noise). Eighteen native English-speaking participants were tested, nine with normal hearing and nine with mild-to-moderate hearing impairment. Frequency-dependent linear amplification was provided for the latter. Processing using the RNN was significantly preferred over no processing by both subject groups for both subjective intelligibility and sound quality, although the magnitude of the preferences was small. High-pass filtering (HPF) was not significantly preferred over no processing. Although RNN was significantly preferred over HPF only for sound quality for the hearing-impaired participants, for the results as a whole, there was a preference for RNN over HPF. Overall, the results suggest that reduction of wind noise using an RNN is possible and might have beneficial effects when used in hearing aids.
High-frequency energy in singing and speech
NASA Astrophysics Data System (ADS)
Monson, Brian Bruce
While human speech and the human voice generate acoustical energy up to (and beyond) 20 kHz, the energy above approximately 5 kHz has been largely neglected. Evidence is accruing that this high-frequency energy contains perceptual information relevant to speech and voice, including percepts of quality, localization, and intelligibility. The present research was an initial step in the long-range goal of characterizing high-frequency energy in singing voice and speech, with particular regard for its perceptual role and its potential for modification during voice and speech production. In this study, a database of high-fidelity recordings of talkers was created and used for a broad acoustical analysis and general characterization of high-frequency energy, as well as specific characterization of phoneme category, voice and speech intensity level, and mode of production (speech versus singing) by high-frequency energy content. Directionality of radiation of high-frequency energy from the mouth was also examined. The recordings were used for perceptual experiments wherein listeners were asked to discriminate between speech and voice samples that differed only in high-frequency energy content. Listeners were also subjected to gender discrimination tasks, mode-of-production discrimination tasks, and transcription tasks with samples of speech and singing that contained only high-frequency content. The combination of these experiments has revealed that (1) human listeners are able to detect very subtle level changes in high-frequency energy, and (2) human listeners are able to extract significant perceptual information from high-frequency energy.
Improved Speech Coding Based on Open-Loop Parameter Estimation
NASA Technical Reports Server (NTRS)
Juang, Jer-Nan; Chen, Ya-Chin; Longman, Richard W.
2000-01-01
A nonlinear optimization algorithm for linear predictive speech coding was developed early that not only optimizes the linear model coefficients for the open loop predictor, but does the optimization including the effects of quantization of the transmitted residual. It also simultaneously optimizes the quantization levels used for each speech segment. In this paper, we present an improved method for initialization of this nonlinear algorithm, and demonstrate substantial improvements in performance. In addition, the new procedure produces monotonically improving speech quality with increasing numbers of bits used in the transmitted error residual. Examples of speech encoding and decoding are given for 8 speech segments and signal to noise levels as high as 47 dB are produced. As in typical linear predictive coding, the optimization is done on the open loop speech analysis model. Here we demonstrate that minimizing the error of the closed loop speech reconstruction, instead of the simpler open loop optimization, is likely to produce negligible improvement in speech quality. The examples suggest that the algorithm here is close to giving the best performance obtainable from a linear model, for the chosen order with the chosen number of bits for the codebook.
Impact of dynamic rate coding aspects of mobile phone networks on forensic voice comparison.
Alzqhoul, Esam A S; Nair, Balamurali B T; Guillemin, Bernard J
2015-09-01
Previous studies have shown that landline and mobile phone networks are different in their ways of handling the speech signal, and therefore in their impact on it. But the same is also true of the different networks within the mobile phone arena. There are two major mobile phone technologies currently in use today, namely the global system for mobile communications (GSM) and code division multiple access (CDMA) and these are fundamentally different in their design. For example, the quality of the coded speech in the GSM network is a function of channel quality, whereas in the CDMA network it is determined by channel capacity (i.e., the number of users sharing a cell site). This paper examines the impact on the speech signal of a key feature of these networks, namely dynamic rate coding, and its subsequent impact on the task of likelihood-ratio-based forensic voice comparison (FVC). Surprisingly, both FVC accuracy and precision are found to be better for both GSM- and CDMA-coded speech than for uncoded. Intuitively one expects FVC accuracy to increase with increasing coded speech quality. This trend is shown to occur for the CDMA network, but, surprisingly, not for the GSM network. Further, in respect to comparisons between these two networks, FVC accuracy for CDMA-coded speech is shown to be slightly better than for GSM-coded speech, particularly when the coded-speech quality is high, but in terms of FVC precision the two networks are shown to be very similar. Copyright © 2015 The Chartered Society of Forensic Sciences. Published by Elsevier Ireland Ltd. All rights reserved.
Noise-immune multisensor transduction of speech
NASA Astrophysics Data System (ADS)
Viswanathan, Vishu R.; Henry, Claudia M.; Derr, Alan G.; Roucos, Salim; Schwartz, Richard M.
1986-08-01
Two types of configurations of multiple sensors were developed, tested and evaluated in speech recognition application for robust performance in high levels of acoustic background noise: One type combines the individual sensor signals to provide a single speech signal input, and the other provides several parallel inputs. For single-input systems, several configurations of multiple sensors were developed and tested. Results from formal speech intelligibility and quality tests in simulated fighter aircraft cockpit noise show that each of the two-sensor configurations tested outperforms the constituent individual sensors in high noise. Also presented are results comparing the performance of two-sensor configurations and individual sensors in speaker-dependent, isolated-word speech recognition tests performed using a commercial recognizer (Verbex 4000) in simulated fighter aircraft cockpit noise.
Dazert, Stefan; Thomas, Jan Peter; Büchner, Andreas; Müller, Joachim; Hempel, John Martin; Löwenheim, Hubert; Mlynski, Robert
2017-03-01
The RONDO is a single-unit cochlear implant audio processor, which omits the need for a behind-the-ear (BTE) audio processor. The primary aim was to compare speech perception results in quiet and in noise with the RONDO and the OPUS 2, a BTE audio processor. Secondary aims were to determine subjects' self-assessed levels of sound quality and gather subjective feedback on RONDO use. All speech perception tests were performed with the RONDO and the OPUS 2 behind-the-ear audio processor at 3 test intervals. Subjects were required to use the RONDO between test intervals. Subjects were tested at upgrade from the OPUS 2 to the RONDO and at 1 and 6 months after upgrade. Speech perception was determined using the Freiburg Monosyllables in quiet test and the Oldenburg Sentence Test (OLSA) in noise. Subjective perception was determined using the Hearing Implant Sound Quality Index (HISQUI 19 ), and a RONDO device-specific questionnaire. 50 subjects participated in the study. Neither speech perception scores nor self-perceived sound quality scores were significantly different at any interval between the RONDO and the OPUS 2. Subjects reported high levels of satisfaction with the RONDO. The RONDO provides comparable speech perception to the OPUS 2 while providing users with high levels of satisfaction and comfort without increasing health risk. The RONDO is a suitable and safe alternative to traditional BTE audio processors.
Keshavarzi, Mahmoud; Goehring, Tobias; Zakis, Justin; Turner, Richard E.; Moore, Brian C. J.
2018-01-01
Despite great advances in hearing-aid technology, users still experience problems with noise in windy environments. The potential benefits of using a deep recurrent neural network (RNN) for reducing wind noise were assessed. The RNN was trained using recordings of the output of the two microphones of a behind-the-ear hearing aid in response to male and female speech at various azimuths in the presence of noise produced by wind from various azimuths with a velocity of 3 m/s, using the “clean” speech as a reference. A paired-comparison procedure was used to compare all possible combinations of three conditions for subjective intelligibility and for sound quality or comfort. The conditions were unprocessed noisy speech, noisy speech processed using the RNN, and noisy speech that was high-pass filtered (which also reduced wind noise). Eighteen native English-speaking participants were tested, nine with normal hearing and nine with mild-to-moderate hearing impairment. Frequency-dependent linear amplification was provided for the latter. Processing using the RNN was significantly preferred over no processing by both subject groups for both subjective intelligibility and sound quality, although the magnitude of the preferences was small. High-pass filtering (HPF) was not significantly preferred over no processing. Although RNN was significantly preferred over HPF only for sound quality for the hearing-impaired participants, for the results as a whole, there was a preference for RNN over HPF. Overall, the results suggest that reduction of wind noise using an RNN is possible and might have beneficial effects when used in hearing aids. PMID:29708061
[Restoration of speech function in oncological patients with maxillary defects].
Matiakin, E G; Chuchkov, V M; Akhundov, A A; Azizian, R I; Romanov, I S; Chuchkov, M V; Agapov, V V
2009-01-01
Speech quality was evaluated in 188 patients with acquired maxillary defects. Prosthetic treatment of 29 patients was preceded by pharmacopsychotherapy. Sixty three patients had lessons with a logopedist and 66 practiced self-tuition based on the specially developed test. Thirty patients were examined for the quality of speech without preliminary preparation. Speech quality was assessed by auditory and spectral analysis. The main forms of impaired speech quality in the patients with maxillary defects were marked rhinophonia and impaired articulation. The proposed analytical tests were based on a combination of "difficult" vowels and consonants. The use of a removable prostheses with an obturator failed to correct the affected speech function but created prerequisites for the formation of the correct speech stereotype. Results of the study suggest the relationship between the quality of speech in subjects with maxillary defects and their intellectual faculties as well as the desire to overcome this drawback. The proposed tests are designed to activate the neuromuscular apparatus responsible for the generation of the speech. Lessons with a speech therapist give a powerful emotional incentive to the patients and promote their efforts toward restoration of speaking ability. Pharmacopsychotherapy and self-control are another efficacious tools for the improvement of speech quality in patients with maxillary defects.
NASA Astrophysics Data System (ADS)
Kayasith, Prakasith; Theeramunkong, Thanaruk
It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Affective Properties of Mothers' Speech to Infants With Hearing Impairment and Cochlear Implants
Bergeson, Tonya R.; Xu, Huiping; Kitamura, Christine
2015-01-01
Purpose The affective properties of infant-directed speech influence the attention of infants with normal hearing to speech sounds. This study explored the affective quality of maternal speech to infants with hearing impairment (HI) during the 1st year after cochlear implantation as compared to speech to infants with normal hearing. Method Mothers of infants with HI and mothers of infants with normal hearing matched by age (NH-AM) or hearing experience (NH-EM) were recorded playing with their infants during 3 sessions over a 12-month period. Speech samples of 25 s were low-pass filtered, leaving intonation but not speech information intact. Sixty adults rated the stimuli along 5 scales: positive/negative affect and intention to express affection, to encourage attention, to comfort/soothe, and to direct behavior. Results Low-pass filtered speech to HI and NH-EM groups was rated as more positive, affective, and comforting compared with the such speech to the NH-AM group. Speech to infants with HI and with NH-AM was rated as more directive than speech to the NH-EM group. Mothers decreased affective qualities in speech to all infants but increased directive qualities in speech to infants with NH-EM over time. Conclusions Mothers fine-tune communicative intent in speech to their infant's developmental stage. They adjust affective qualities to infants' hearing experience rather than to chronological age but adjust directive qualities of speech to the chronological age of their infants. PMID:25679195
Landwehr, Markus; Fürstenberg, Dirk; Walger, Martin; von Wedel, Hasso; Meister, Hartmut
2014-01-01
Advances in speech coding strategies and electrode array designs for cochlear implants (CIs) predominantly aim at improving speech perception. Current efforts are also directed at transmitting appropriate cues of the fundamental frequency (F0) to the auditory nerve with respect to speech quality, prosody, and music perception. The aim of this study was to examine the effects of various electrode configurations and coding strategies on speech intonation identification, speaker gender identification, and music quality rating. In six MED-EL CI users electrodes were selectively deactivated in order to simulate different insertion depths and inter-electrode distances when using the high definition continuous interleaved sampling (HDCIS) and fine structure processing (FSP) speech coding strategies. Identification of intonation and speaker gender was determined and music quality rating was assessed. For intonation identification HDCIS was robust against the different electrode configurations, whereas fine structure processing showed significantly worse results when a short electrode depth was simulated. In contrast, speaker gender recognition was not affected by electrode configuration or speech coding strategy. Music quality rating was sensitive to electrode configuration. In conclusion, the three experiments revealed different outcomes, even though they all addressed the reception of F0 cues. Rapid changes in F0, as seen with intonation, were the most sensitive to electrode configurations and coding strategies. In contrast, electrode configurations and coding strategies did not show large effects when F0 information was available over a longer time period, as seen with speaker gender. Music quality relies on additional spectral cues other than F0, and was poorest when a shallow insertion was simulated.
Calculation of selective filters of a device for primary analysis of speech signals
NASA Astrophysics Data System (ADS)
Chudnovskii, L. S.; Ageev, V. M.
2014-07-01
The amplitude-frequency responses of filters for primary analysis of speech signals, which have a low quality factor and a high rolloff factor in the high-frequency range, are calculated using the linear theory of speech production and psychoacoustic measurement data. The frequency resolution of the filter system for a sinusoidal signal is 40-200 Hz. The modulation-frequency resolution of amplitude- and frequency-modulated signals is 3-6 Hz. The aforementioned features of the calculated filters are close to the amplitudefrequency responses of biological auditory systems at the level of the eighth nerve.
Pilot Workload and Speech Analysis: A Preliminary Investigation
NASA Technical Reports Server (NTRS)
Bittner, Rachel M.; Begault, Durand R.; Christopher, Bonny R.
2013-01-01
Prior research has questioned the effectiveness of speech analysis to measure the stress, workload, truthfulness, or emotional state of a talker. The question remains regarding the utility of speech analysis for restricted vocabularies such as those used in aviation communications. A part-task experiment was conducted in which participants performed Air Traffic Control read-backs in different workload environments. Participant's subjective workload and the speech qualities of fundamental frequency (F0) and articulation rate were evaluated. A significant increase in subjective workload rating was found for high workload segments. F0 was found to be significantly higher during high workload while articulation rates were found to be significantly slower. No correlation was found to exist between subjective workload and F0 or articulation rate.
Advancements in text-to-speech technology and implications for AAC applications
NASA Astrophysics Data System (ADS)
Syrdal, Ann K.
2003-10-01
Intelligibility was the initial focus in text-to-speech (TTS) research, since it is clearly a necessary condition for the application of the technology. Sufficiently high intelligibility (approximating human speech) has been achieved in the last decade by the better formant-based and concatenative TTS systems. This led to commercially available TTS systems for highly motivated users, particularly the blind and vocally impaired. Some unnatural qualities of TTS were exploited by these users, such as very fast speaking rates and altered pitch ranges for flagging relevant information. Recently, the focus in TTS research has turned to improving naturalness, so that synthetic speech sounds more human and less robotic. Unit selection approaches to concatenative synthesis have dramatically improved TTS quality, although at the cost of larger and more complex systems. This advancement in naturalness has made TTS technology more acceptable to the general public. The vocally impaired appreciate a more natural voice with which to represent themselves when communicating with others. Unit selection TTS does not achieve such high speaking rates as the earlier TTS systems, however, which is a disadvantage to some AAC device users. An important new research emphasis is to improve and increase the range of emotional expressiveness of TTS.
Patient-reported symptom questionnaires in laryngeal cancer: voice, speech and swallowing.
Rinkel, R N P M; Verdonck-de Leeuw, I M; van den Brakel, N; de Bree, R; Eerenstein, S E J; Aaronson, N; Leemans, C R
2014-08-01
To validate questionnaires on voice, speech, and swallowing among laryngeal cancer patients, to assess the need for and use of rehabilitation services, and to determine the association between voice, speech, and swallowing problems, and quality of life and distress. Laryngeal cancer patients at least three months post-treatment completed the VHI (voice), SHI (speech), SWAL-QOL (swallowing), EORTC QLQ-C30, QLQ-HN35, HADS, and study-specific questions on rehabilitation. Eighty-eight patients and 110 healthy controls participated. Cut off scores of 15, 6, and 14 were defined for the VHI, SHI, and SWAL-QOL (sensitivity > 90%; specificity > 80%). Based on these scores, 56% of the patients reported voice, 63% speech, and 54% swallowing problems. VHI, SHI, and SWAL-QOL scores were associated significantly with quality of life (EORTC QLQ-C30 global quality of life scale) (r = .43 (VHI and SHI) and r = .46 (SWAL-QOL)) and distress (r = .50 (VHI and SHI) and r = .58 (SWAL-QOL)). In retrospect, 32% of the patients indicated the need for rehabilitation at time of treatment, and 81% of these patients availed themselves of such services. Post-treatment, 8% of the patients expressed a need for rehabilitation, and 20% of these patients actually made use of such services. Psychometric characteristics of the VHI, SHI, and SWAL-QOL in laryngeal cancer patients are good. The prevalence of voice, speech, and swallowing problems is high, and clearly related to quality of life and distress. Although higher during than after treatment, the perceived need for and use of rehabilitation services is limited. Copyright © 2014 Elsevier Ltd. All rights reserved.
Creating speech-synchronized animation.
King, Scott A; Parent, Richard E
2005-01-01
We present a facial model designed primarily to support animated speech. Our facial model takes facial geometry as input and transforms it into a parametric deformable model. The facial model uses a muscle-based parameterization, allowing for easier integration between speech synchrony and facial expressions. Our facial model has a highly deformable lip model that is grafted onto the input facial geometry to provide the necessary geometric complexity needed for creating lip shapes and high-quality renderings. Our facial model also includes a highly deformable tongue model that can represent the shapes the tongue undergoes during speech. We add teeth, gums, and upper palate geometry to complete the inner mouth. To decrease the processing time, we hierarchically deform the facial surface. We also present a method to animate the facial model over time to create animated speech using a model of coarticulation that blends visemes together using dominance functions. We treat visemes as a dynamic shaping of the vocal tract by describing visemes as curves instead of keyframes. We show the utility of the techniques described in this paper by implementing them in a text-to-audiovisual-speech system that creates animation of speech from unrestricted text. The facial and coarticulation models must first be interactively initialized. The system then automatically creates accurate real-time animated speech from the input text. It is capable of cheaply producing tremendous amounts of animated speech with very low resource requirements.
ERIC Educational Resources Information Center
Burgess, Sloane; Turkstra, Lyn S.
2010-01-01
Purpose: This study was designed to evaluate the feasibility of using the American Speech-Language-Hearing Association's Quality of Communication Life Scale (QCL; Paul et al., 2004) for a group of individuals with developmental communication disorders--adolescents with high-functioning autism/Asperger syndrome (HFA/AS). Perceptions of quality of…
Harlander, Niklas; Rosenkranz, Tobias; Hohmann, Volker
2012-08-01
Single channel noise reduction has been well investigated and seems to have reached its limits in terms of speech intelligibility improvement, however, the quality of such schemes can still be advanced. This study tests to what extent novel model-based processing schemes might improve performance in particular for non-stationary noise conditions. Two prototype model-based algorithms, a speech-model-based, and a auditory-model-based algorithm were compared to a state-of-the-art non-parametric minimum statistics algorithm. A speech intelligibility test, preference rating, and listening effort scaling were performed. Additionally, three objective quality measures for the signal, background, and overall distortions were applied. For a better comparison of all algorithms, particular attention was given to the usage of the similar Wiener-based gain rule. The perceptual investigation was performed with fourteen hearing-impaired subjects. The results revealed that the non-parametric algorithm and the auditory model-based algorithm did not affect speech intelligibility, whereas the speech-model-based algorithm slightly decreased intelligibility. In terms of subjective quality, both model-based algorithms perform better than the unprocessed condition and the reference in particular for highly non-stationary noise environments. Data support the hypothesis that model-based algorithms are promising for improving performance in non-stationary noise conditions.
Evaluation on health-related quality of life in deaf children with cochlear implant in China.
Liu, Hong; Liu, Hong-Xiang; Kang, Hou-Yong; Gu, Zheng; Hong, Su-Ling
2016-09-01
Previous studies have shown that deaf children benefit considerably from cochlear implants. These improvements are found in areas such as speech perception, speech production, and audiology-verbal performance. Despite the increasing prevalence of cochlear implants in China, few studies have reported on health-related quality of life in children with cochlear implants. The main objective of this study was to explore health-related quality of life on children with cochlear implants in South-west China. A retrospective observational study of 213 CI users in Southwest China between 2010 and 2013. Participants were 213 individuals with bilateral severe-to-profound hearing loss who wore unilateral cochlear implants. The Nijmegen Cochlear Implant Questionnaire and Health Utility Index Mark III were used pre-implantation and 1 year post-implantation. Additionally, 1-year postoperative scores for Mandarin speech perception were compared with preoperative scores. Health-related quality of life improved post-operation with scores on the Nijmegen Cochlear Implant Questionnaire improving significantly in all subdomains, and the Health Utility Index 3 showing a significant improvement in the utility score and the subdomains of ''hearing," ''speech," and "emotion". Additionally, a significant improvement in speech recognition scores was found. No significant correlation was found between increased in quality of life and speech perception scores. Health-related quality of life and speech recognition in prelingual deaf children significantly improved post-operation. The lack of correlation between quality of life and speech perception suggests that when evaluating performance post-implantation in prelingual deaf children and adolescents, measures of both speech perception and quality of life should be used. Copyright © 2016. Published by Elsevier Ireland Ltd.
Surgical improvement of speech disorder caused by amyotrophic lateral sclerosis.
Saigusa, Hideto; Yamaguchi, Satoshi; Nakamura, Tsuyoshi; Komachi, Taro; Kadosono, Osamu; Ito, Hiroyuki; Saigusa, Makoto; Niimi, Seiji
2012-12-01
Amyotrophic lateral sclerosis (ALS) is a progressive debilitating neurological disease. ALS disturbs the quality of life by affecting speech, swallowing and free mobility of the arms without affecting intellectual function. It is therefore of significance to improve intelligibility and quality of speech sounds, especially for ALS patients with slowly progressive courses. Currently, however, there is no effective or established approach to improve speech disorder caused by ALS. We investigated a surgical procedure to improve speech disorder for some patients with neuromuscular diseases with velopharyngeal closure incompetence. In this study, we performed the surgical procedure for two patients suffering from severe speech disorder caused by slowly progressing ALS. The patients suffered from speech disorder with hypernasality and imprecise and weak articulation during a 6-year course (patient 1) and a 3-year course (patient 2) of slowly progressing ALS. We narrowed bilateral lateral palatopharyngeal wall at velopharyngeal port, and performed this surgery under general anesthesia without muscle relaxant for the two patients. Postoperatively, intelligibility and quality of their speech sounds were greatly improved within one month without any speech therapy. The patients were also able to generate longer speech phrases after the surgery. Importantly, there was no serious complication during or after the surgery. In summary, we performed bilateral narrowing of lateral palatopharyngeal wall as a speech surgery for two patients suffering from severe speech disorder associated with ALS. With this technique, improved intelligibility and quality of speech can be maintained for longer duration for the patients with slowly progressing ALS.
Children's perception of their synthetically corrected speech production.
Strömbergsson, Sofia; Wengelin, Asa; House, David
2014-06-01
We explore children's perception of their own speech - in its online form, in its recorded form, and in synthetically modified forms. Children with phonological disorder (PD) and children with typical speech and language development (TD) performed tasks of evaluating accuracy of the different types of speech stimuli, either immediately after having produced the utterance or after a delay. In addition, they performed a task designed to assess their ability to detect synthetic modification. Both groups showed high performance in tasks involving evaluation of other children's speech, whereas in tasks of evaluating one's own speech, the children with PD were less accurate than their TD peers. The children with PD were less sensitive to misproductions in immediate conjunction with their production of an utterance, and more accurate after a delay. Within-category modification often passed undetected, indicating a satisfactory quality of the generated speech. Potential clinical benefits of using corrective re-synthesis are discussed.
ERIC Educational Resources Information Center
Snodgrass, Melinda R.; Chung, Moon Y.; Biller, Maysoon F.; Appel, Katie E.; Meadan, Hedda; Halle, James W.
2017-01-01
Researchers and practitioners have found that telepractice is an effective means of increasing access to high-quality services that meet children's unique needs and is a viable mechanism to deliver speech-language services for multiple purposes. We offer a framework to facilitate the implementation of practices that are used in direct…
Voice Quality Modelling for Expressive Speech Synthesis
Socoró, Joan Claudi
2014-01-01
This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F 0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics. PMID:24587738
NASA Astrophysics Data System (ADS)
Nakagawa, Seiji; Fujiyuki, Chika; Kagomiya, Takayuki
2012-07-01
Bone-conducted ultrasound (BCU) is perceived even by the profoundly sensorineural deaf. A novel hearing aid using the perception of amplitude-modulated BCU (BCU hearing aid: BCUHA) has been developed; however, further improvements are needed, especially in terms of articulation and sound quality. In this study, the intelligibility and sound quality of BCU speech with several types of amplitude modulation [double-sideband with transmitted carrier (DSB-TC), double-sideband with suppressed carrier (DSB-SC), and transposed modulation] were evaluated. The results showed that DSB-TC and transposed speech were more intelligible than DSB-SC speech, and transposed speech was closer than the other types of BCU speech to air-conducted speech in terms of sound quality. These results provide useful information for further development of the BCUHA.
High-frame-rate full-vocal-tract 3D dynamic speech imaging.
Fu, Maojing; Barlaz, Marissa S; Holtrop, Joseph L; Perry, Jamie L; Kuehn, David P; Shosted, Ryan K; Liang, Zhi-Pei; Sutton, Bradley P
2017-04-01
To achieve high temporal frame rate, high spatial resolution and full-vocal-tract coverage for three-dimensional dynamic speech MRI by using low-rank modeling and sparse sampling. Three-dimensional dynamic speech MRI is enabled by integrating a novel data acquisition strategy and an image reconstruction method with the partial separability model: (a) a self-navigated sparse sampling strategy that accelerates data acquisition by collecting high-nominal-frame-rate cone navigator sand imaging data within a single repetition time, and (b) are construction method that recovers high-quality speech dynamics from sparse (k,t)-space data by enforcing joint low-rank and spatiotemporal total variation constraints. The proposed method has been evaluated through in vivo experiments. A nominal temporal frame rate of 166 frames per second (defined based on a repetition time of 5.99 ms) was achieved for an imaging volume covering the entire vocal tract with a spatial resolution of 2.2 × 2.2 × 5.0 mm 3 . Practical utility of the proposed method was demonstrated via both validation experiments and a phonetics investigation. Three-dimensional dynamic speech imaging is possible with full-vocal-tract coverage, high spatial resolution and high nominal frame rate to provide dynamic speech data useful for phonetic studies. Magn Reson Med 77:1619-1629, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Wren, Yvonne; Harding, Sam; Goldbart, Juliet; Roulstone, Sue
2018-05-01
Multiple interventions have been developed to address speech sound disorder (SSD) in children. Many of these have been evaluated but the evidence for these has not been considered within a model which categorizes types of intervention. The opportunity to carry out a systematic review of interventions for SSD arose as part of a larger scale study of interventions for primary speech and language impairment in preschool children. To review systematically the evidence for interventions for SSD in preschool children and to categorize them within a classification of interventions for SSD. Relevant search terms were used to identify intervention studies published up to 2012, with the following inclusion criteria: participants were aged between 2 years and 5 years, 11 months; they exhibited speech, language and communication needs; and a primary outcome measure of speech was used. Studies that met inclusion criteria were quality appraised using the single case experimental design (SCED) or PEDro-P, depending on their methodology. Those judged to be high quality were classified according to the primary focus of intervention. The final review included 26 studies. Case series was the most common research design. Categorization to the classification system for interventions showed that cognitive-linguistic and production approaches to intervention were the most frequently reported. The highest graded evidence was for three studies within the auditory-perceptual and integrated categories. The evidence for intervention for preschool children with SSD is focused on seven out of 11 subcategories of interventions. Although all the studies included in the review were good quality as defined by quality appraisal checklists, they mostly represented lower-graded evidence. Higher-graded studies are needed to understand clearly the strength of evidence for different interventions. © 2018 Royal College of Speech and Language Therapists.
A screening approach for classroom acoustics using web-based listening tests and subjective ratings.
Persson Waye, Kerstin; Magnusson, Lennart; Fredriksson, Sofie; Croy, Ilona
2015-01-01
Perception of speech is crucial in school where speech is the main mode of communication. The aim of the study was to evaluate whether a web based approach including listening tests and questionnaires could be used as a screening tool for poor classroom acoustics. The prime focus was the relation between pupils' comprehension of speech, the classroom acoustics and their description of the acoustic qualities of the classroom. In total, 1106 pupils aged 13-19, from 59 classes and 38 schools in Sweden participated in a listening study using Hagerman's sentences administered via Internet. Four listening conditions were applied: high and low background noise level and positions close and far away from the loudspeaker. The pupils described the acoustic quality of the classroom and teachers provided information on the physical features of the classroom using questionnaires. In 69% of the classes, at least three pupils described the sound environment as adverse and in 88% of the classes one or more pupil reported often having difficulties concentrating due to noise. The pupils' comprehension of speech was strongly influenced by the background noise level (p<0.001) and distance to the loudspeakers (p<0.001). Of the physical classroom features, presence of suspended acoustic panels (p<0.05) and length of the classroom (p<0.01) predicted speech comprehension. Of the pupils' descriptions of acoustic qualities, clattery significantly (p<0.05) predicted speech comprehension. Clattery was furthermore associated to difficulties understanding each other, while the description noisy was associated to concentration difficulties. The majority of classrooms do not seem to have an optimal sound environment. The pupil's descriptions of acoustic qualities and listening tests can be one way of predicting sound conditions in the classroom.
Chorna, Olena; Hamm, Ellyn; Cummings, Caitlin; Fetters, Ashley; Maitre, Nathalie L
2017-01-01
Aim We evaluated the level of evidence of speech, language, and communication interventions for infants at high-risk for, or with a diagnosis of, cerebral palsy (CP) from 0 to 2 years old. Method We performed a systematic review of relevant terms. Articles were evaluated based on the level of methodological quality and evidence according to A Measurement Tool to Assess Systematic Reviews (AMSTAR) and Grading of Recommendations Assessment, Development and Evaluation (GRADE) guidelines. Results The search terms provided 17 publications consisting of speech or language interventions. There were no interventions in the high level of evidence category. The overall level of evidence was very low. Promising interventions included Responsivity and Prelinguistic Milieu Teaching and other parent–infant transaction frameworks. Interpretation There are few evidence-based interventions addressing speech, language, and communication needs of infants and toddlers at high risk for CP, and none for infants diagnosed with CP. Recommendation guidelines include parent–infant transaction programs. PMID:27897320
Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech.
Gerratt, Bruce R; Kreiman, Jody; Garellek, Marc
2016-10-01
The question of what type of utterance-a sustained vowel or continuous speech-is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation. Speakers with voice disorders sustained vowels and read sentences. Vowel samples were excerpted from the steadiest portion of each vowel in the sentences. In addition to sustained and excerpted vowels, a 3rd set of stimuli was created by shortening sustained vowel productions to match the duration of vowels excerpted from continuous speech. Acoustic measures were made on the stimuli, and listeners judged the severity of vocal quality deviation. Sustained vowels and those extracted from continuous speech contain essentially the same acoustic and perceptual information about vocal quality deviation. Perceived and/or measured differences between continuous speech and sustained vowels derive largely from voice source variability across segmental and prosodic contexts and not from variations in vocal fold vibration in the quasisteady portion of the vowels. Approaches to voice quality assessment by using continuous speech samples average across utterances and may not adequately quantify the variability they are intended to assess.
Evitts, Paul M; Starmer, Heather; Teets, Kristine; Montgomery, Christen; Calhoun, Lauren; Schulze, Allison; MacKenzie, Jenna; Adams, Lauren
2016-11-01
There is currently minimal information on the impact of dysphonia secondary to phonotrauma on listeners. Considering the high incidence of voice disorders with professional voice users, it is important to understand the impact of a dysphonic voice on their audiences. Ninety-one healthy listeners (39 men, 52 women; mean age = 23.62 years) were presented with speech stimuli from 5 healthy speakers and 5 speakers diagnosed with dysphonia secondary to phonotrauma. Dependent variables included processing speed (reaction time [RT] ratio), speech intelligibility, and listener comprehension. Voice quality ratings were also obtained for all speakers by 3 expert listeners. Statistical results showed significant differences between RT ratio and number of speech intelligibility errors between healthy and dysphonic voices. There was not a significant difference in listener comprehension errors. Multiple regression analyses showed that voice quality ratings from the Consensus Assessment Perceptual Evaluation of Voice (Kempster, Gerratt, Verdolini Abbott, Barkmeier-Kraemer, & Hillman, 2009) were able to predict RT ratio and speech intelligibility but not listener comprehension. Results of the study suggest that although listeners require more time to process and have more intelligibility errors when presented with speech stimuli from speakers with dysphonia secondary to phonotrauma, listener comprehension may not be affected.
Dual Key Speech Encryption Algorithm Based Underdetermined BSS
Zhao, Huan; Chen, Zuo; Zhang, Xixiang
2014-01-01
When the number of the mixed signals is less than that of the source signals, the underdetermined blind source separation (BSS) is a significant difficult problem. Due to the fact that the great amount data of speech communications and real-time communication has been required, we utilize the intractability of the underdetermined BSS problem to present a dual key speech encryption method. The original speech is mixed with dual key signals which consist of random key signals (one-time pad) generated by secret seed and chaotic signals generated from chaotic system. In the decryption process, approximate calculation is used to recover the original speech signals. The proposed algorithm for speech signals encryption can resist traditional attacks against the encryption system, and owing to approximate calculation, decryption becomes faster and more accurate. It is demonstrated that the proposed method has high level of security and can recover the original signals quickly and efficiently yet maintaining excellent audio quality. PMID:24955430
A 4.8 kbps code-excited linear predictive coder
NASA Technical Reports Server (NTRS)
Tremain, Thomas E.; Campbell, Joseph P., Jr.; Welch, Vanoy C.
1988-01-01
A secure voice system STU-3 capable of providing end-to-end secure voice communications (1984) was developed. The terminal for the new system will be built around the standard LPC-10 voice processor algorithm. The performance of the present STU-3 processor is considered to be good, its response to nonspeech sounds such as whistles, coughs and impulse-like noises may not be completely acceptable. Speech in noisy environments also causes problems with the LPC-10 voice algorithm. In addition, there is always a demand for something better. It is hoped that LPC-10's 2.4 kbps voice performance will be complemented with a very high quality speech coder operating at a higher data rate. This new coder is one of a number of candidate algorithms being considered for an upgraded version of the STU-3 in late 1989. The problems of designing a code-excited linear predictive (CELP) coder to provide very high quality speech at a 4.8 kbps data rate that can be implemented on today's hardware are considered.
Nuttall, Helen E.; Moore, David R.; Barry, Johanna G.; Krumbholz, Katrin
2015-01-01
The speech-evoked auditory brain stem response (speech ABR) is widely considered to provide an index of the quality of neural temporal encoding in the central auditory pathway. The aim of the present study was to evaluate the extent to which the speech ABR is shaped by spectral processing in the cochlea. High-pass noise masking was used to record speech ABRs from delimited octave-wide frequency bands between 0.5 and 8 kHz in normal-hearing young adults. The latency of the frequency-delimited responses decreased from the lowest to the highest frequency band by up to 3.6 ms. The observed frequency-latency function was compatible with model predictions based on wave V of the click ABR. The frequency-delimited speech ABR amplitude was largest in the 2- to 4-kHz frequency band and decreased toward both higher and lower frequency bands despite the predominance of low-frequency energy in the speech stimulus. We argue that the frequency dependence of speech ABR latency and amplitude results from the decrease in cochlear filter width with decreasing frequency. The results suggest that the amplitude and latency of the speech ABR may reflect interindividual differences in cochlear, as well as central, processing. The high-pass noise-masking technique provides a useful tool for differentiating between peripheral and central effects on the speech ABR. It can be used for further elucidating the neural basis of the perceptual speech deficits that have been associated with individual differences in speech ABR characteristics. PMID:25787954
Xiao, Bo; Huang, Chewei; Imel, Zac E; Atkins, David C; Georgiou, Panayiotis; Narayanan, Shrikanth S
2016-04-01
Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy-a key therapy quality index-from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist's language cues. We employed Maximum Entropy models, Maximum Likelihood language models, and a Lattice Rescoring method to characterize high vs. low empathic language. We estimated therapy-session level empathy codes using utterance level evidence obtained from these models. Our experiments showed that the fully automated system achieved a correlation of 0.643 between expert annotated empathy codes and machine-derived estimations, and an accuracy of 81% in classifying high vs. low empathy, in comparison to a 0.721 correlation and 86% accuracy in the oracle setting using manual transcripts. The results show that the system provides useful information that can contribute to automatic quality insurance and therapist training.
Xiao, Bo; Huang, Chewei; Imel, Zac E.; Atkins, David C.; Georgiou, Panayiotis; Narayanan, Shrikanth S.
2016-01-01
Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy—a key therapy quality index—from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist's language cues. We employed Maximum Entropy models, Maximum Likelihood language models, and a Lattice Rescoring method to characterize high vs. low empathic language. We estimated therapy-session level empathy codes using utterance level evidence obtained from these models. Our experiments showed that the fully automated system achieved a correlation of 0.643 between expert annotated empathy codes and machine-derived estimations, and an accuracy of 81% in classifying high vs. low empathy, in comparison to a 0.721 correlation and 86% accuracy in the oracle setting using manual transcripts. The results show that the system provides useful information that can contribute to automatic quality insurance and therapist training. PMID:28286867
Hansen, J H; Nandkumar, S
1995-01-01
The formulation of reliable signal processing algorithms for speech coding and synthesis require the selection of a prior criterion of performance. Though coding efficiency (bits/second) or computational requirements can be used, a final performance measure must always include speech quality. In this paper, three objective speech quality measures are considered with respect to quality assessment for American English, noisy American English, and noise-free versions of seven languages. The purpose is to determine whether objective quality measures can be used to quantify changes in quality for a given voice coding method, with a known subjective performance level, as background noise or language conditions are changed. The speech coding algorithm chosen is regular-pulse excitation with long-term prediction (RPE-LTP), which has been chosen as the standard voice compression algorithm for the European Digital Mobile Radio system. Three areas are considered for objective quality assessment which include: (i) vocoder performance for American English in a noise-free environment, (ii) speech quality variation for three additive background noise sources, and (iii) noise-free performance for seven languages which include English, Japanese, Finnish, German, Hindi, Spanish, and French. It is suggested that although existing objective quality measures will never replace subjective testing, they can be a useful means of assessing changes in performance, identifying areas for improvement in algorithm design, and augmenting subjective quality tests for voice coding/compression algorithms in noise-free, noisy, and/or non-English applications.
Vainshtein, Jeffrey M; Griffith, Kent A; Feng, Felix Y; Vineberg, Karen A; Chepeha, Douglas B; Eisbruch, Avraham
2014-08-01
To describe voice and speech quality changes and their predictors in patients with locally advanced oropharyngeal cancer treated on prospective clinical studies of organ-preserving chemotherapy-intensity modulated radiation therapy (chemo-IMRT). Ninety-one patients with stage III/IV oropharyngeal cancer were treated on 2 consecutive prospective studies of definitive chemoradiation using whole-field IMRT from 2003 to 2011. Patient-reported voice and speech quality were longitudinally assessed from before treatment through 24 months using the Communication Domain of the Head and Neck Quality of Life (HNQOL-C) instrument and the Speech question of the University of Washington Quality of Life (UWQOL-S) instrument, respectively. Factors associated with patient-reported voice quality worsening from baseline and speech impairment were assessed. Voice quality decreased maximally at 1 month, with 68% and 41% of patients reporting worse HNQOL-C and UWQOL-S scores compared with before treatment, and improved thereafter, recovering to baseline by 12-18 months on average. In contrast, observer-rated larynx toxicity was rare (7% at 3 months; 5% at 6 months). Among patients with mean glottic larynx (GL) dose ≤20 Gy, >20-30 Gy, >30-40 Gy, >40-50 Gy, and >50 Gy, 10%, 32%, 25%, 30%, and 63%, respectively, reported worse voice quality at 12 months compared with before treatment (P=.011). Results for speech impairment were similar. Glottic larynx dose, N stage, neck dissection, oral cavity dose, and time since chemo-IMRT were univariately associated with either voice worsening or speech impairment. On multivariate analysis, mean GL dose remained independently predictive for both voice quality worsening (8.1%/Gy) and speech impairment (4.3%/Gy). Voice quality worsening and speech impairment after chemo-IMRT for locally advanced oropharyngeal cancer were frequently reported by patients, underrecognized by clinicians, and independently associated with GL dose. These findings support reducing mean GL dose to as low as reasonably achievable, aiming at ≤20 Gy when the larynx is not a target. Copyright © 2014 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vainshtein, Jeffrey M.; Griffith, Kent A.; Feng, Felix Y.
Purpose: To describe voice and speech quality changes and their predictors in patients with locally advanced oropharyngeal cancer treated on prospective clinical studies of organ-preserving chemotherapy–intensity modulated radiation therapy (chemo-IMRT). Methods and Materials: Ninety-one patients with stage III/IV oropharyngeal cancer were treated on 2 consecutive prospective studies of definitive chemoradiation using whole-field IMRT from 2003 to 2011. Patient-reported voice and speech quality were longitudinally assessed from before treatment through 24 months using the Communication Domain of the Head and Neck Quality of Life (HNQOL-C) instrument and the Speech question of the University of Washington Quality of Life (UWQOL-S) instrument, respectively.more » Factors associated with patient-reported voice quality worsening from baseline and speech impairment were assessed. Results: Voice quality decreased maximally at 1 month, with 68% and 41% of patients reporting worse HNQOL-C and UWQOL-S scores compared with before treatment, and improved thereafter, recovering to baseline by 12-18 months on average. In contrast, observer-rated larynx toxicity was rare (7% at 3 months; 5% at 6 months). Among patients with mean glottic larynx (GL) dose ≤20 Gy, >20-30 Gy, >30-40 Gy, >40-50 Gy, and >50 Gy, 10%, 32%, 25%, 30%, and 63%, respectively, reported worse voice quality at 12 months compared with before treatment (P=.011). Results for speech impairment were similar. Glottic larynx dose, N stage, neck dissection, oral cavity dose, and time since chemo-IMRT were univariately associated with either voice worsening or speech impairment. On multivariate analysis, mean GL dose remained independently predictive for both voice quality worsening (8.1%/Gy) and speech impairment (4.3%/Gy). Conclusions: Voice quality worsening and speech impairment after chemo-IMRT for locally advanced oropharyngeal cancer were frequently reported by patients, underrecognized by clinicians, and independently associated with GL dose. These findings support reducing mean GL dose to as low as reasonably achievable, aiming at ≤20 Gy when the larynx is not a target.« less
Dog-directed speech: why do we use it and do dogs pay attention to it?
Ben-Aderet, Tobey; Gallego-Abenza, Mario
2017-01-01
Pet-directed speech is strikingly similar to infant-directed speech, a peculiar speaking pattern with higher pitch and slower tempo known to engage infants' attention and promote language learning. Here, we report the first investigation of potential factors modulating the use of dog-directed speech, as well as its immediate impact on dogs' behaviour. We recorded adult participants speaking in front of pictures of puppies, adult and old dogs, and analysed the quality of their speech. We then performed playback experiments to assess dogs' reaction to dog-directed speech compared with normal speech. We found that human speakers used dog-directed speech with dogs of all ages and that the acoustic structure of dog-directed speech was mostly independent of dog age, except for sound pitch which was relatively higher when communicating with puppies. Playback demonstrated that, in the absence of other non-auditory cues, puppies were highly reactive to dog-directed speech, and that the pitch was a key factor modulating their behaviour, suggesting that this specific speech register has a functional value in young dogs. Conversely, older dogs did not react differentially to dog-directed speech compared with normal speech. The fact that speakers continue to use dog-directed with older dogs therefore suggests that this speech pattern may mainly be a spontaneous attempt to facilitate interactions with non-verbal listeners. PMID:28077769
Dog-directed speech: why do we use it and do dogs pay attention to it?
Ben-Aderet, Tobey; Gallego-Abenza, Mario; Reby, David; Mathevon, Nicolas
2017-01-11
Pet-directed speech is strikingly similar to infant-directed speech, a peculiar speaking pattern with higher pitch and slower tempo known to engage infants' attention and promote language learning. Here, we report the first investigation of potential factors modulating the use of dog-directed speech, as well as its immediate impact on dogs' behaviour. We recorded adult participants speaking in front of pictures of puppies, adult and old dogs, and analysed the quality of their speech. We then performed playback experiments to assess dogs' reaction to dog-directed speech compared with normal speech. We found that human speakers used dog-directed speech with dogs of all ages and that the acoustic structure of dog-directed speech was mostly independent of dog age, except for sound pitch which was relatively higher when communicating with puppies. Playback demonstrated that, in the absence of other non-auditory cues, puppies were highly reactive to dog-directed speech, and that the pitch was a key factor modulating their behaviour, suggesting that this specific speech register has a functional value in young dogs. Conversely, older dogs did not react differentially to dog-directed speech compared with normal speech. The fact that speakers continue to use dog-directed with older dogs therefore suggests that this speech pattern may mainly be a spontaneous attempt to facilitate interactions with non-verbal listeners. © 2017 The Author(s).
Fuller, Christina; Free, Rolien; Maat, Bert; Başkent, Deniz
2012-08-01
In normal-hearing listeners, musical background has been observed to change the sound representation in the auditory system and produce enhanced performance in some speech perception tests. Based on these observations, it has been hypothesized that musical background can influence sound and speech perception, and as an extension also the quality of life, by cochlear-implant users. To test this hypothesis, this study explored musical background [using the Dutch Musical Background Questionnaire (DMBQ)], and self-perceived sound and speech perception and quality of life [using the Nijmegen Cochlear Implant Questionnaire (NCIQ) and the Speech Spatial and Qualities of Hearing Scale (SSQ)] in 98 postlingually deafened adult cochlear-implant recipients. In addition to self-perceived measures, speech perception scores (percentage of phonemes recognized in words presented in quiet) were obtained from patient records. The self-perceived hearing performance was associated with the objective speech perception. Forty-one respondents (44% of 94 respondents) indicated some form of formal musical training. Fifteen respondents (18% of 83 respondents) judged themselves as having musical training, experience, and knowledge. No association was observed between musical background (quantified by DMBQ), and self-perceived hearing-related performance or quality of life (quantified by NCIQ and SSQ), or speech perception in quiet.
Advancements in robust algorithm formulation for speaker identification of whispered speech
NASA Astrophysics Data System (ADS)
Fan, Xing
Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard/made public. Due to the profound differences between whispered and neutral speech in production mechanism and the absence of whispered adaptation data, the performance of speaker identification systems trained with neutral speech degrades significantly. This dissertation therefore focuses on developing a robust closed-set speaker recognition system for whispered speech by using no or limited whispered adaptation data from non-target speakers. This dissertation proposes the concept of "High''/"Low'' performance whispered data for the purpose of speaker identification. A variety of acoustic properties are identified that contribute to the quality of whispered data. An acoustic analysis is also conducted to compare the phoneme/speaker dependency of the differences between whispered and neutral data in the feature domain. The observations from those acoustic analysis are new in this area and also serve as a guidance for developing robust speaker identification systems for whispered speech. This dissertation further proposes two systems for speaker identification of whispered speech. One system focuses on front-end processing. A two-dimensional feature space is proposed to search for "Low''-quality performance based whispered utterances and separate feature mapping functions are applied to vowels and consonants respectively in order to retain the speaker's information shared between whispered and neutral speech. The other system focuses on speech-mode-independent model training. The proposed method generates pseudo whispered features from neutral features by using the statistical information contained in a whispered Universal Background model (UBM) trained from extra collected whispered data from non-target speakers. Four modeling methods are proposed for the transformation estimation in order to generate the pseudo whispered features. Both of the above two systems demonstrate a significant improvement over the baseline system on the evaluation data. This dissertation has therefore contributed to providing a scientific understanding of the differences between whispered and neutral speech as well as improved front-end processing and modeling method for speaker identification of whispered speech. Such advancements will ultimately contribute to improve the robustness of speech processing systems.
Simplified APC for Space Shuttle applications. [Adaptive Predictive Coding for speech transmission
NASA Technical Reports Server (NTRS)
Hutchins, S. E.; Batson, B. H.
1975-01-01
This paper describes an 8 kbps adaptive predictive digital speech transmission system which was designed for potential use in the Space Shuttle Program. The system was designed to provide good voice quality in the presence of both cabin noise on board the Shuttle and the anticipated bursty channel. Minimal increase in size, weight, and power over the current high data rate system was also a design objective.
Comparing Measures of Voice Quality from Sustained Phonation and Continuous Speech
ERIC Educational Resources Information Center
Gerratt, Bruce R.; Kreiman, Jody; Garellek, Marc
2016-01-01
Purpose: The question of what type of utterance--a sustained vowel or continuous speech--is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation.…
Reliability in perceptual analysis of voice quality.
Bele, Irene Velsvik
2005-12-01
This study focuses on speaking voice quality in male teachers (n = 35) and male actors (n = 36), who represent untrained and trained voice users, because we wanted to investigate normal and supranormal voices. In this study, both substantial and methodologic aspects were considered. It includes a method for perceptual voice evaluation, and a basic issue was rater reliability. A listening group of 10 listeners, 7 experienced speech-language therapists, and 3 speech-language therapist students evaluated the voices by 15 vocal characteristics using VA scales. Two sets of voice signals were investigated: text reading (2 loudness levels) and sustained vowel (3 levels). The results indicated a high interrater reliability for most perceptual characteristics. Connected speech was evaluated more reliably, especially at the normal level, but both types of voice signals were evaluated reliably, although the reliability for connected speech was somewhat higher than for vowels. Experienced listeners tended to be more consistent in their ratings than did the student raters. Some vocal characteristics achieved acceptable reliability even with a smaller panel of listeners. The perceptual characteristics grouped in 4 factors reflected perceptual dimensions.
Krüger, H P
1989-02-01
The term "speech chronemics" is introduced to characterize a research strategy which extracts from the physical qualities of the speech signal only the pattern of ons ("speaking") and offs ("pausing"). The research in this field can be structured into the methodological dimension "unit of time", "number of speakers", and "quality of the prosodic measures". It is shown that a researcher's actual decision for one method largely determines the outcome of his study. Then, with the Logoport a new portable measurement device is presented. It enables the researcher to study speaking behavior over long periods of time (up to 24 hours) in the normal environment of his subjects. Two experiments are reported. The first shows the validity of articulation pauses for variations in the physiological state of the organism. The second study proves a new betablocking agent to have sociotropic effects: in a long-term trial socially high-strung subjects showed an improved interaction behavior (compared to placebo and socially easy-going persons) in their everyday life. Finally, the need for a comprehensive theoretical foundation and for standardization of measurement situations and methods is emphasized.
Alternative Speech Communication System for Persons with Severe Speech Disorders
NASA Astrophysics Data System (ADS)
Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas
2009-12-01
Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
Pradubwong, Suteera; Mongkholthawornchai, Siriporn; Keawkhamsean, Natda; Patjanasoontorn, Niramol; Chowchuen, Bowornsilp
2014-10-01
Cleft lips and cleft palates are common congenital anomalies, which affects facial appearance, speech, hearing, teeth alignment and other structures. Craniofacial anomalies and speech disorders are crucial problems in the preschool-aged children (5-6 years old), when they start attending school and become more engaged in the community. This condition, which differentiates them from other students, can lead to teasing or mocking which can cause low-self esteem, an inferiority complex, andfoster bad relationships with friends. Missing class in order to receive treatment and other additional care can affect a student's learning, development and overall-quality of life. The purpose of this research was to study the quality of life in preschool-aged cleftpalate children and satisfaction with their level of speech. This was a retrospective, descriptive study. The data were collected by reviewing medical records of patients with cleft lip and cleft palate aged 5-6 years old who underwent operation and treatment with the Tawanchai Center at Srinagarind Hospital. There were 39patients in this study. Data collection was conducted for 5 months (June to October 2013). The research instruments were: (1) General Demographic Questionnaire, (2) Quality of Life Questionnaire with 5 Domains, and (3) the Satisfaction of Speech Questionnaire. The descriptive statistics, percentages and the standard deviation were analyzed in the present study. The findings revealedfamily information pertaining to CLP treatment and the impact it has on consumption, speech training, hearing test, development, dental treatment, communication skills, participation, referral treatment as well as the quality ofcoordinationfor advanced treatment. The present study revealed that all ofthe aforementioned criteria were met at a high level. Moreover the child's sickness had only a moderate impact on family life. In conclusion, the overall satisfaction was at a very high level. It was concluded that the collaboration of the Tawanchai Cleft Center and the government, as well as with private and non-governmental organizations was exceptional, particularly in regard to providing proper and continuous treatment for patients with cleft lips and/or cleft palate. The findings reflect a good quality of life in the pre-schooled children with cleft lip and cleft palate that received treatment from the Tawanchai Cleft Center at Srinagarind Hospital. Furthermore, the study showed that the problems associated with the condition, only affected the family's lives at a minimal level.
Spectral analysis method and sample generation for real time visualization of speech
NASA Astrophysics Data System (ADS)
Hobohm, Klaus
A method for translating speech signals into optical models, characterized by high sound discrimination and learnability and designed to provide to deaf persons a feedback towards control of their way of speaking, is presented. Important properties of speech production and perception processes and organs involved in these mechanisms are recalled in order to define requirements for speech visualization. It is established that the spectral representation of time, frequency and amplitude resolution of hearing must be fair and continuous variations of acoustic parameters of speech signal must be depicted by a continuous variation of images. A color table was developed for dynamic illustration and sonograms were generated with five spectral analysis methods such as Fourier transformations and linear prediction coding. For evaluating sonogram quality, test persons had to recognize consonant/vocal/consonant words and an optimized analysis method was achieved with a fast Fourier transformation and a postprocessor. A hardware concept of a real time speech visualization system, based on multiprocessor technology in a personal computer, is presented.
Subjective comparison and evaluation of speech enhancement algorithms
Hu, Yi; Loizou, Philipos C.
2007-01-01
Making meaningful comparisons between the performance of the various speech enhancement algorithms proposed over the years, has been elusive due to lack of a common speech database, differences in the types of noise used and differences in the testing methodology. To facilitate such comparisons, we report on the development of a noisy speech corpus suitable for evaluation of speech enhancement algorithms. This corpus is subsequently used for the subjective evaluation of 13 speech enhancement methods encompassing four classes of algorithms: spectral subtractive, subspace, statistical-model based and Wiener-type algorithms. The subjective evaluation was performed by Dynastat, Inc. using the ITU-T P.835 methodology designed to evaluate the speech quality along three dimensions: signal distortion, noise distortion and overall quality. This paper reports the results of the subjective tests. PMID:18046463
Salorio-Corbetto, Marina; Baer, Thomas; Moore, Brian C. J.
2017-01-01
Abstract Objective: The objective was to assess the degradation of speech sound quality produced by frequency compression for listeners with extensive high-frequency dead regions (DRs). Design: Quality ratings were obtained using values of the starting frequency (Sf) of the frequency compression both below and above the estimated edge frequency, fe, of each DR. Thus, the value of Sf often fell below the lowest value currently used in clinical practice. Several compression ratios were used for each value of Sf. Stimuli were sentences processed via a prototype hearing aid based on Phonak Exélia Art P. Study sample: Five participants (eight ears) with extensive high-frequency DRs were tested. Results: Reductions of sound-quality produced by frequency compression were small to moderate. Ratings decreased significantly with decreasing Sf and increasing CR. The mean ratings were lowest for the lowest Sf and highest CR. Ratings varied across participants, with one participant rating frequency compression lower than no frequency compression even when Sf was above fe. Conclusions: Frequency compression degraded sound quality somewhat for this small group of participants with extensive high-frequency DRs. The degradation was greater for lower values of Sf relative to fe, and for greater values of CR. Results varied across participants. PMID:27724057
Aldridge, Danielle; Theodoros, Deborah; Angwin, Anthony; Vogel, Adam P
2016-12-01
Deep brain stimulation (DBS) of the subthalamic nucleus (STN) is effective in reducing motor symptoms for many individuals with Parkinson's disease (PD). However, STN DBS does not appear to influence speech in the same way, and may result in a variety of negative outcomes for people with PD (PWP). A high degree of inter-individual variability amongst PWP regarding speech outcomes following STN DBS is evident in many studies. Furthermore, speech studies in PWP following STN DBS have employed a wide variety of designs and methodologies, which complicate comparison and interpretation of outcome data amongst studies within this growing body of research. An analysis of published evidence regarding speech outcomes in PWP following STN DBS, according to design and quality, is missing. This systematic review aimed to analyse and coalesce all of the current evidence reported within observational and experimental studies investigating the effects of STN DBS on speech. It will strengthen understanding of the relationship between STN DBS and speech, and inform future research by highlighting methodological limitations of current evidence. Copyright © 2016 Elsevier Ltd. All rights reserved.
Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin
2016-01-01
Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714
Knipfer, Christian; Riemann, Max; Bocklet, Tobias; Noeth, Elmar; Schuster, Maria; Sokol, Biljana; Eitner, Stephan; Nkenke, Emeka; Stelzle, Florian
2014-01-01
Tooth loss and its prosthetic rehabilitation significantly affect speech intelligibility. However, little is known about the influence of speech deficiencies on oral health-related quality of life (OHRQoL). The aim of this study was to investigate whether speech intelligibility enhancement through prosthetic rehabilitation significantly influences OHRQoL in patients wearing complete maxillary dentures. Speech intelligibility by means of an automatic speech recognition system (ASR) was prospectively evaluated and compared with subjectively assessed Oral Health Impact Profile (OHIP) scores. Speech was recorded in 28 edentulous patients 1 week prior to the fabrication of new complete maxillary dentures and 6 months thereafter. Speech intelligibility was computed based on the word accuracy (WA) by means of an ASR and compared with a matched control group. One week before and 6 months after rehabilitation, patients assessed themselves for OHRQoL. Speech intelligibility improved significantly after 6 months. Subjects reported a significantly higher OHRQoL after maxillary rehabilitation with complete dentures. No significant correlation was found between the OHIP sum score or its subscales to the WA. Speech intelligibility enhancement achieved through the fabrication of new complete maxillary dentures might not be in the forefront of the patients' perception of their quality of life. For the improvement of OHRQoL in patients wearing complete maxillary dentures, food intake and mastication as well as freedom from pain play a more prominent role.
Voice and Speech after Laryngectomy
ERIC Educational Resources Information Center
Stajner-Katusic, Smiljka; Horga, Damir; Musura, Maja; Globlek, Dubravka
2006-01-01
The aim of the investigation is to compare voice and speech quality in alaryngeal patients using esophageal speech (ESOP, eight subjects), electroacoustical speech aid (EACA, six subjects) and tracheoesophageal voice prosthesis (TEVP, three subjects). The subjects reading a short story were recorded in the sound-proof booth and the speech samples…
NASA Technical Reports Server (NTRS)
Mcaulay, Robert J.; Quatieri, Thomas F.
1988-01-01
It has been shown that an analysis/synthesis system based on a sinusoidal representation of speech leads to synthetic speech that is essentially perceptually indistinguishable from the original. Strategies for coding the amplitudes, frequencies and phases of the sine waves have been developed that have led to a multirate coder operating at rates from 2400 to 9600 bps. The encoded speech is highly intelligible at all rates with a uniformly improving quality as the data rate is increased. A real-time fixed-point implementation has been developed using two ADSP2100 DSP chips. The methods used for coding and quantizing the sine-wave parameters for operation at the various frame rates are described.
[Objective study of the voice quality following partial laryngectomy].
Remacle, M; Millet, B
1991-01-01
The high resolution frequency analyzer is used for the study of the vocal quality after partial laryngectomy. The post-operative plot after speech therapy is of good quality when respecting one vocal fold. On the contrary, the heard vocal sound does not correspond to the harmonics of the fundamental frequency but to intense noise from irregular vibrations of the residual laryngeal mucosa (ventricular folds, arytenoids). High resolution frequency analysis contributes to the follow-up of the partial laryngectomy.
Shannon, Robert V.; Cruz, Rachel J.; Galvin, John J.
2011-01-01
High stimulation rates in cochlear implants (CI) offer better temporal sampling, can induce stochastic-like firing of auditory neurons and can increase the electric dynamic range, all of which could improve CI speech performance. While commercial CI have employed increasingly high stimulation rates, no clear or consistent advantage has been shown for high rates. In this study, speech recognition was acutely measured with experimental processors in 7 CI subjects (Clarion CII users). The stimulation rate varied between (approx.) 600 and 4800 pulses per second per electrode (ppse) and the number of active electrodes varied between 4 and 16. Vowel, consonant, consonant-nucleus-consonant word and IEEE sentence recognition was acutely measured in quiet and in steady noise (+10 dB signal-to-noise ratio). Subjective quality ratings were obtained for each of the experimental processors in quiet and in noise. Except for a small difference for vowel recognition in quiet, there were no significant differences in performance among the experimental stimulation rates for any of the speech measures. There was also a small but significant increase in subjective quality rating as stimulation rates increased from 1200 to 2400 ppse in noise. Consistent with previous studies, performance significantly improved as the number of electrodes was increased from 4 to 8, but no significant difference showed between 8, 12 and 16 electrodes. Altogether, there was little-to-no advantage of high stimulation rates in quiet or in noise, at least for the present speech tests and conditions. PMID:20639631
Quality of communication in interpreted versus noninterpreted PICU family meetings.
Van Cleave, Alisa C; Roosen-Runge, Megan U; Miller, Alison B; Milner, Lauren C; Karkazis, Katrina A; Magnus, David C
2014-06-01
To describe the quality of physician-family communication during interpreted and noninterpreted family meetings in the PICU. Prospective, exploratory, descriptive observational study of noninterpreted English family meetings and interpreted Spanish family meetings in the pediatric intensive care setting. A single, university-based, tertiary children's hospital. Participants in PICU family meetings, including medical staff, family members, ancillary staff, and interpreters. Thirty family meetings (21 English and nine Spanish) were audio-recorded, transcribed, de-identified, and analyzed using the qualitative method of directed content analysis. Quality of communication was analyzed in three ways: 1) presence of elements of shared decision-making, 2) balance between physician and family speech, and 3) complexity of physician speech. Of the 11 elements of shared decision-making, only four occurred in more than half of English meetings, and only three occurred in more than half of Spanish meetings. Physicians spoke for a mean of 20.7 minutes, while families spoke for 9.3 minutes during English meetings. During Spanish meetings, physicians spoke for a mean of 14.9 minutes versus just 3.7 minutes of family speech. Physician speech complexity received a mean grade level score of 8.2 in English meetings compared to 7.2 in Spanish meetings. The quality of physician-family communication during PICU family meetings is poor overall. Interpreted meetings had poorer communication quality as evidenced by fewer elements of shared decision-making and greater imbalance between physician and family speech. However, physician speech may be less complex during interpreted meetings. Our data suggest that physicians can improve communication in both interpreted and noninterpreted family meetings by increasing the use of elements of shared decision-making, improving the balance between physician and family speech, and decreasing the complexity of physician speech.
Perception of intelligibility and qualities of non-native accented speakers.
Fuse, Akiko; Navichkova, Yuliya; Alloggio, Krysteena
To provide effective treatment to clients, speech-language pathologists must be understood, and be perceived to demonstrate the personal qualities necessary for therapeutic practice (e.g., resourcefulness and empathy). One factor that could interfere with the listener's perception of non-native speech is the speaker's accent. The current study explored the relationship between how accurately listeners could understand non-native speech and their perceptions of personal attributes of the speaker. Additionally, this study investigated how listeners' familiarity and experience with other languages may influence their perceptions of non-native accented speech. Through an online survey, native monolingual and bilingual English listeners rated four non-native accents (i.e., Spanish, Chinese, Russian, and Indian) on perceived intelligibility and perceived personal qualities (i.e., professionalism, intelligence, resourcefulness, empathy, and patience) necessary for speech-language pathologists. The results indicated significant relationships between the perception of intelligibility and the perception of personal qualities (i.e., professionalism, intelligence, and resourcefulness) attributed to non-native speakers. However, these findings were not supported for the Chinese accent. Bilingual listeners judged the non-native speech as more intelligible in comparison to monolingual listeners. No significant differences were found in the ratings between bilingual listeners who share the same language background as the speaker and other bilingual listeners. Based on the current findings, greater perception of intelligibility was the key to promoting a positive perception of personal qualities such as professionalism, intelligence, and resourcefulness, important for speech-language pathologists. The current study found evidence to support the claim that bilinguals have a greater ability in understanding non-native accented speech compared to monolingual listeners. The results, however, did not confirm an advantage for bilingual listeners sharing the same language backgrounds with the non-native speaker over other bilingual listeners. Copyright © 2017 Elsevier Inc. All rights reserved.
Reddy, Rajgopal R; Gosla Reddy, Srinivas; Vaidhyanathan, Anitha; Bergé, Stefaan J; Kuijpers-Jagtman, Anne Marie
2017-06-01
The number of surgical procedures to repair a cleft palate may play a role in the outcome for maxillofacial growth and speech. The aim of this systematic review was to investigate the relationship between the number of surgical procedures performed to repair the cleft palate and maxillofacial growth, speech and fistula formation in non-syndromic patients with unilateral cleft lip and palate. An electronic search was performed in PubMed/old MEDLINE, the Cochrane Library, EMBASE, Scopus and CINAHL databases for publications between 1960 and December 2015. Publications before 1950-journals of plastic and maxillofacial surgery-were hand searched. Additional hand searches were performed on studies mentioned in the reference lists of relevant articles. Search terms included unilateral, cleft lip and/or palate and palatoplasty. Two reviewers assessed eligibility for inclusion, extracted data, applied quality indicators and graded level of evidence. Twenty-six studies met the inclusion criteria. All were retrospective and non-randomized comparisons of one- and two-stage palatoplasty. The methodological quality of most of the studies was graded moderate to low. The outcomes concerned the comparison of one- and two-stage palatoplasty with respect to growth of the mandible, maxilla and cranial base, and speech and fistula formation. Due to the lack of high-quality studies there is no conclusive evidence of a relationship between one- or two-stage palatoplasty and facial growth, speech and fistula formation in patients with unilateral cleft lip and palate. Copyright © 2017 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Automating annotation of information-giving for analysis of clinical conversation.
Mayfield, Elijah; Laws, M Barton; Wilson, Ira B; Penstein Rosé, Carolyn
2014-02-01
Coding of clinical communication for fine-grained features such as speech acts has produced a substantial literature. However, annotation by humans is laborious and expensive, limiting application of these methods. We aimed to show that through machine learning, computers could code certain categories of speech acts with sufficient reliability to make useful distinctions among clinical encounters. The data were transcripts of 415 routine outpatient visits of HIV patients which had previously been coded for speech acts using the Generalized Medical Interaction Analysis System (GMIAS); 50 had also been coded for larger scale features using the Comprehensive Analysis of the Structure of Encounters System (CASES). We aggregated selected speech acts into information-giving and requesting, then trained the machine to automatically annotate using logistic regression classification. We evaluated reliability by per-speech act accuracy. We used multiple regression to predict patient reports of communication quality from post-visit surveys using the patient and provider information-giving to information-requesting ratio (briefly, information-giving ratio) and patient gender. Automated coding produces moderate reliability with human coding (accuracy 71.2%, κ=0.57), with high correlation between machine and human prediction of the information-giving ratio (r=0.96). The regression significantly predicted four of five patient-reported measures of communication quality (r=0.263-0.344). The information-giving ratio is a useful and intuitive measure for predicting patient perception of provider-patient communication quality. These predictions can be made with automated annotation, which is a practical option for studying large collections of clinical encounters with objectivity, consistency, and low cost, providing greater opportunity for training and reflection for care providers.
Alternating motion rate as an index of speech motor disorder in traumatic brain injury.
Wang, Yu-Tsai; Kent, Ray D; Duffy, Joseph R; Thomas, Jack E; Weismer, Gary
2004-01-01
The task of syllable alternating motion rate (AMR) (also called diadochokinesis) is suitable for examining speech disorders of varying degrees of severity and in individuals with varying levels of linguistic and cognitive ability. However, very limited information on this task has been published for subjects with traumatic brain injury (TBI). This study is a quantitative and qualitative acoustic analysis of AMR in seven subjects with TBI. The primary goal was to use acoustic analyses to assess speech motor control disturbances for the group as a whole and for individual patients. Quantitative analyses included measures of syllable rate, syllable and intersyllable gap durations, energy maxima, and voice onset time (VOT). Qualitative analyses included classification of features evident in spectrograms and waveforms to provide a more detailed description. The TBI group had (1) a slowed syllable rate due mostly to lengthened syllables and, to a lesser degree, lengthened intersyllable gaps, (2) highly correlated syllable rates between AMR and conversation, (3) temporal and energy maxima irregularities within repetition sequences, (4) normal median VOT values but with large variation, and (5) a number of speech production abnormalities revealed by qualitative analysis, including explosive speech quality, breathy voice quality, phonatory instability, multiple or missing stop bursts, continuous voicing, and spirantization. The relationships between these findings and TBI speakers' neurological status and dysarthria types are also discussed. It was concluded that acoustic analyses of the AMR task provides specific information on motor speech limitations in individuals with TBI.
Kraaijenga, S A C; Oskam, I M; van Son, R J J H; Hamming-Vrieze, O; Hilgers, F J M; van den Brekel, M W M; van der Molen, L
2016-04-01
Assessment of long-term objective and subjective voice, speech, articulation, and quality of life in patients with head and neck cancer (HNC) treated with concurrent chemoradiotherapy (CRT) for advanced, stage IV disease. Twenty-two disease-free survivors, treated with cisplatin-based CRT for inoperable HNC (1999-2004), were evaluated at 10-years post-treatment. A standard Dutch text was recorded. Perceptual analysis of voice, speech, and articulation was conducted by two expert listeners (SLPs). Also an experimental expert system based on automatic speech recognition was used. Patients' perception of voice and speech and related quality of life was assessed with the Voice Handicap Index (VHI) and Speech Handicap Index (SHI) questionnaires. At a median follow-up of 11-years, perceptual evaluation showed abnormal scores in up to 64% of cases, depending on the outcome parameter analyzed. Automatic assessment of voice and speech parameters correlated moderate to strong with perceptual outcome scores. Patient-reported problems with voice (VHI>15) and speech (SHI>6) in daily life were present in 68% and 77% of patients, respectively. Patients treated with IMRT showed significantly less impairment compared to those treated with conventional radiotherapy. More than 10-years after organ-preservation treatment, voice and speech problems are common in this patient cohort, as assessed with perceptual evaluation, automatic speech recognition, and with validated structured questionnaires. There were fewer complaints in patients treated with IMRT than with conventional radiotherapy. Copyright © 2016 Elsevier Ltd. All rights reserved.
Articulation of sounds in Serbian language in patients who learned esophageal speech successfully.
Vekić, Maja; Veselinović, Mila; Mumović, Gordana; Mitrović, Slobodan M
2014-01-01
Articulation of pronounced sounds during the training and subsequent use of esophageal speech is very important because it contributes significantly to intelligibility and aesthetics of spoken words and sentences, as well as of speech and language itself. The aim of this research was to determine the quality of articulation of sounds of Serbian language by groups of sounds in patients who had learned esophageal speech successfully as well as the effect of age and tooth loss on the quality of articulation. This retrospective-prospective study included 16 patients who had undergone total laryngectomy. Having completed the rehabilitation of speech, these patient used esophageal voice and speech. The quality of articulation was tested by the "Global test of articulation." Esophageal speech was rated with grade 5, 4 and 3 in 62.5%, 31.3% and one patient, respectively. Serbian was the native language of all the patients. The study included 30 sounds of Serbian language in 16 subjects (480 total sounds). Only two patients (12.5%) articulated all sounds properly, whereas 87.5% of them had incorrect articulation. The articulation of affricates and fricatives, especially sound /h/ from the group of the fricatives, was found to be the worst in the patients who had successfully mastered esophageal speech. The age and the tooth loss of patients who have mastered esophageal speech do not affect the articulation of sounds in Serbian language.
NASA Technical Reports Server (NTRS)
Wolf, Jared J.
1977-01-01
The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described.
Seeto, Angeline; Searchfield, Grant D
2018-03-01
Advances in digital signal processing have made it possible to provide a wide-band frequency response with smooth, precise spectral shaping. Several manufacturers have introduced hearing aids that are claimed to provide gain for frequencies up to 10-12 kHz. However, there is currently limited evidence and very few independent studies evaluating the performance of the extended bandwidth hearing aids that have recently become available. This study investigated an extended bandwidth hearing aid using measures of speech intelligibility and sound quality to find out whether there was a significant benefit of extended bandwidth amplification over standard amplification. Repeated measures study designed to examine the efficacy of extended bandwidth amplification compared to standard bandwidth amplification. Sixteen adult participants with mild-to-moderate sensorineural hearing loss. Participants were bilaterally fit with a pair of Widex Mind 440 behind-the-ear hearing aids programmed with a standard bandwidth fitting and an extended bandwidth fitting; the latter provided gain up to 10 kHz. For each fitting, and an unaided condition, participants completed two speech measures of aided benefit, the Quick Speech-in-Noise test (QuickSIN™) and the Phonak Phoneme Perception Test (PPT; high-frequency perception in quiet), and a measure of sound quality rating. There were no significant differences found between unaided and aided conditions for QuickSIN™ scores. For the PPT, there were statistically significantly lower (improved) detection thresholds at high frequencies (6 and 9 kHz) with the extended bandwidth fitting. Although not statistically significant, participants were able to distinguish between 6 and 9 kHz 50% better with extended bandwidth. No significant difference was found in ability to recognize phonemes in quiet between the unaided and aided conditions when phonemes only contained frequency content <6 kHz. However significant benefit was found with the extended bandwidth fitting for recognition of 9-kHz phonemes. No significant difference in sound quality preference was found between the standard bandwidth and extended bandwidth fittings. This study demonstrated that a pair of currently available extended bandwidth hearing aids was technically capable of delivering high-frequency amplification that was both audible and useable to listeners with mild-to-moderate hearing loss. This amplification was of acceptable sound quality. Further research, particularly field trials, is required to ascertain the real-world benefit of high-frequency amplification. American Academy of Audiology
Speech and Voice Response to a Levodopa Challenge in Late-Stage Parkinson's Disease.
Fabbri, Margherita; Guimarães, Isabel; Cardoso, Rita; Coelho, Miguel; Guedes, Leonor Correia; Rosa, Mario M; Godinho, Catarina; Abreu, Daisy; Gonçalves, Nilza; Antonini, Angelo; Ferreira, Joaquim J
2017-01-01
Parkinson's disease (PD) patients are affected by hypokinetic dysarthria, characterized by hypophonia and dysprosody, which worsens with disease progression. Levodopa's (l-dopa) effect on quality of speech is inconclusive; no data are currently available for late-stage PD (LSPD). To assess the modifications of speech and voice in LSPD following an acute l-dopa challenge. LSPD patients [Schwab and England score <50/Hoehn and Yahr stage >3 (MED ON)] performed several vocal tasks before and after an acute l-dopa challenge. The following was assessed: respiratory support for speech, voice quality, stability and variability, speech rate, and motor performance (MDS-UPDRS-III). All voice samples were recorded and analyzed by a speech and language therapist blinded to patients' therapeutic condition using Praat 5.1 software. 24/27 (14 men) LSPD patients succeeded in performing voice tasks. Median age and disease duration of patients were 79 [IQR: 71.5-81.7] and 14.5 [IQR: 11-15.7] years, respectively. In MED OFF, respiratory breath support and pitch break time of LSPD patients were worse than the normative values of non-parkinsonian. A correlation was found between disease duration and voice quality ( R = 0.51; p = 0.013) and speech rate ( R = -0.55; p = 0.008). l-Dopa significantly improved MDS-UPDRS-III score (20%), with no effect on speech as assessed by clinical rating scales and automated analysis. Speech is severely affected in LSPD. Although l-dopa had some effect on motor performance, including axial signs, speech and voice did not improve. The applicability and efficacy of non-pharmacological treatment for speech impairment should be considered for speech disorder management in PD.
Acoustic Quality Levels of Mosques in Batu Pahat
NASA Astrophysics Data System (ADS)
Azizah Adnan, Nor; Nafida Raja Shahminan, Raja; Khair Ibrahim, Fawazul; Tami, Hannifah; Yusuff, M. Rizal M.; Murniwaty Samsudin, Emedya; Ismail, Isham
2018-04-01
Every Friday, Muslims has been required to perform a special prayer known as the Friday prayers which involve the delivery of a brief lecture (Khutbah). Speech intelligibility in oral communications presented by the preacher affected all the congregation and determined the level of acoustic quality in the interior of the mosque. Therefore, this study intended to assess the level of acoustic quality of three public mosques in Batu Pahat. Good acoustic quality is essential in contributing towards appreciation in prayers and increasing khusyu’ during the worship, which is closely related to the speech intelligibility corresponding to the actual function of the mosque according to Islam. Acoustic parameters measured includes noise criteria (NC), reverberation time (RT) and speech transmission index (STI), and was performed using the sound level meter and sound measurement instruments. This test is carried out through the physical observation with the consideration of space and volume design as a factor affecting acoustic parameters. Results from all 3 mosques as the showed that the acoustic quality level inside these buildings are slightly poor which is at below 0.45 coefficients based on the standard. Among the factors that influencing the low acoustical quality are location, building materials, installation of sound absorption material and the number of occupants inside the mosque. As conclusion, the acoustic quality level of a mosque is highly depends on physical factors of the mosque such as the architectural design and space volume besides other factors as been identified by this study.
Kim, Young-Suk; Puranik, Cynthia; Otaiba, Stephanie Al
2013-01-01
We examined growth trajectories of writing and the relation of children's socio-economic status, and language and/or speech impairment to the growth trajectories. First grade children (N = 304) were assessed on their written composition in the fall, winter, and spring, and their vocabulary and literacy skills in the fall. Children's SES had a negative effect on writing quality and productivity. Children with language and/or speech impairment had lower scores than typically developing children in the quality and productivity of writing. Even after accounting for their vocabulary and literacy skills, students with language and/or speech impairment had lower scores in the quality and organization of writing. Growth rates in writing were not different as a function of children's SES and language/speech impairment status. Theoretical and practical implications are discussed. PMID:26146410
Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers
Mustafa, Mumtaz Begum; Salim, Siti Salwah; Mohamed, Noraini; Al-Qatab, Bassam; Siong, Chng Eng
2014-01-01
Automatic speech recognition (ASR) is currently used in many assistive technologies, such as helping individuals with speech impairment in their communication ability. One challenge in ASR for speech-impaired individuals is the difficulty in obtaining a good speech database of impaired speakers for building an effective speech acoustic model. Because there are very few existing databases of impaired speech, which are also limited in size, the obvious solution to build a speech acoustic model of impaired speech is by employing adaptation techniques. However, issues that have not been addressed in existing studies in the area of adaptation for speech impairment are as follows: (1) identifying the most effective adaptation technique for impaired speech; and (2) the use of suitable source models to build an effective impaired-speech acoustic model. This research investigates the above-mentioned two issues on dysarthria, a type of speech impairment affecting millions of people. We applied both unimpaired and impaired speech as the source model with well-known adaptation techniques like the maximum likelihood linear regression (MLLR) and the constrained-MLLR(C-MLLR). The recognition accuracy of each impaired speech acoustic model is measured in terms of word error rate (WER), with further assessments, including phoneme insertion, substitution and deletion rates. Unimpaired speech when combined with limited high-quality speech-impaired data improves performance of ASR systems in recognising severely impaired dysarthric speech. The C-MLLR adaptation technique was also found to be better than MLLR in recognising mildly and moderately impaired speech based on the statistical analysis of the WER. It was found that phoneme substitution was the biggest contributing factor in WER in dysarthric speech for all levels of severity. The results show that the speech acoustic models derived from suitable adaptation techniques improve the performance of ASR systems in recognising impaired speech with limited adaptation data. PMID:24466004
Development and Perceptual Evaluation of Amplitude-Based F0 Control in Electrolarynx Speech
ERIC Educational Resources Information Center
Saikachi, Yoko; Stevens, Kenneth N.; Hillman, Robert E.
2009-01-01
Purpose: Current electrolarynx (EL) devices produce a mechanical speech quality that has been largely attributed to the lack of natural fundamental frequency (F0) variation. In order to improve the quality of EL speech, in the present study the authors aimed to develop and evaluate an automatic F0 control scheme, in which F0 was modulated based on…
Affective Properties of Mothers' Speech to Infants with Hearing Impairment and Cochlear Implants
ERIC Educational Resources Information Center
Kondaurova, Maria V.; Bergeson, Tonya R.; Xu, Huiping; Kitamura, Christine
2015-01-01
Purpose: The affective properties of infant-directed speech influence the attention of infants with normal hearing to speech sounds. This study explored the affective quality of maternal speech to infants with hearing impairment (HI) during the 1st year after cochlear implantation as compared to speech to infants with normal hearing. Method:…
The Speech Discipline in Crisis - - A Cause for Hope.
ERIC Educational Resources Information Center
Lanigan, Richard L.
Speech communication is a distinct discipline, but one in a healthy state of conflict between theory and practice. The crisis in the speech discipline (and in academic generally) exists because speech does not present itself as a consumable value; quality program decisions are not made; speech is often conceived as only one subject matter; general…
de Taillez, Tobias; Grimm, Giso; Kollmeier, Birger; Neher, Tobias
2018-06-01
To investigate the influence of an algorithm designed to enhance or magnify interaural difference cues on speech signals in noisy, spatially complex conditions using both technical and perceptual measurements. To also investigate the combination of interaural magnification (IM), monaural microphone directionality (DIR), and binaural coherence-based noise reduction (BC). Speech-in-noise stimuli were generated using virtual acoustics. A computational model of binaural hearing was used to analyse the spatial effects of IM. Predicted speech quality changes and signal-to-noise-ratio (SNR) improvements were also considered. Additionally, a listening test was carried out to assess speech intelligibility and quality. Listeners aged 65-79 years with and without sensorineural hearing loss (N = 10 each). IM increased the horizontal separation of concurrent directional sound sources without introducing any major artefacts. In situations with diffuse noise, however, the interaural difference cues were distorted. Preprocessing the binaural input signals with DIR reduced distortion. IM influenced neither speech intelligibility nor speech quality. The IM algorithm tested here failed to improve speech perception in noise, probably because of the dispersion and inconsistent magnification of interaural difference cues in complex environments.
Fifty years of progress in speech coding standards
NASA Astrophysics Data System (ADS)
Cox, Richard
2004-10-01
Over the past 50 years, speech coding has taken root worldwide. Early applications were for the military and transmission for telephone networks. The military gave equal priority to intelligibility and low bit rate. The telephone network gave priority to high quality and low delay. These illustrate three of the four areas in which requirements must be set for any speech coder application: bit rate, quality, delay, and complexity. While the military could afford relatively expensive terminal equipment for secure communications, the telephone network needed low cost for massive deployment in switches and transmission equipment worldwide. Today speech coders are at the heart of the wireless phones and telephone answering systems we use every day. In addition to the technology and technical invention that has occurred, standards make it possible for all these different systems to interoperate. The primary areas of standardization are the public switched telephone network, wireless telephony, and secure telephony for government and military applications. With the advent of IP telephony there are additional standardization efforts and challenges. In this talk the progress in all areas is reviewed as well as a reflection on Jim Flanagan's impact on this field during the past half century.
Emotion to emotion speech conversion in phoneme level
NASA Astrophysics Data System (ADS)
Bulut, Murtaza; Yildirim, Serdar; Busso, Carlos; Lee, Chul Min; Kazemzadeh, Ebrahim; Lee, Sungbok; Narayanan, Shrikanth
2004-10-01
Having an ability to synthesize emotional speech can make human-machine interaction more natural in spoken dialogue management. This study investigates the effectiveness of prosodic and spectral modification in phoneme level on emotion-to-emotion speech conversion. The prosody modification is performed with the TD-PSOLA algorithm (Moulines and Charpentier, 1990). We also transform the spectral envelopes of source phonemes to match those of target phonemes using LPC-based spectral transformation approach (Kain, 2001). Prosodic speech parameters (F0, duration, and energy) for target phonemes are estimated from the statistics obtained from the analysis of an emotional speech database of happy, angry, sad, and neutral utterances collected from actors. Listening experiments conducted with native American English speakers indicate that the modification of prosody only or spectrum only is not sufficient to elicit targeted emotions. The simultaneous modification of both prosody and spectrum results in higher acceptance rates of target emotions, suggesting that not only modeling speech prosody but also modeling spectral patterns that reflect underlying speech articulations are equally important to synthesize emotional speech with good quality. We are investigating suprasegmental level modifications for further improvement in speech quality and expressiveness.
Underwater speech communications with a modulated laser
NASA Astrophysics Data System (ADS)
Woodward, B.; Sari, H.
2008-04-01
A novel speech communications system using a modulated laser beam has been developed for short-range applications in which high directionality is an exploitable feature. Although it was designed for certain underwater applications, such as speech communications between divers or between a diver and the surface, it may equally be used for air applications. With some modification it could be used for secure diver-to-diver communications in the situation where untethered divers are swimming close together and do not want their conversations monitored by intruders. Unlike underwater acoustic communications, where the transmitted speech may be received at ranges of hundreds of metres omnidirectionally, a laser communication link is very difficult to intercept and also obviates the need for cables that become snagged or broken. Further applications include the transmission of speech and data, including the short message service (SMS), from a fixed installation such as a sea-bed habitat; and data transmission to and from an autonomous underwater vehicle (AUV), particularly during docking manoeuvres. The performance of the system has been assessed subjectively by listening tests, which revealed that the speech was intelligible, although of poor quality due to the speech algorithm used.
Comparing Binaural Pre-processing Strategies I: Instrumental Evaluation.
Baumgärtel, Regina M; Krawczyk-Becker, Martin; Marquardt, Daniel; Völker, Christoph; Hu, Hongmei; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Ernst, Stephan M A; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias
2015-12-30
In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios. © The Author(s) 2015.
Comparing Binaural Pre-processing Strategies I
Krawczyk-Becker, Martin; Marquardt, Daniel; Völker, Christoph; Hu, Hongmei; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Ernst, Stephan M. A.; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias
2015-01-01
In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios. PMID:26721920
Speech Recognition for Medical Dictation: Overview in Quebec and Systematic Review.
Poder, Thomas G; Fisette, Jean-François; Déry, Véronique
2018-04-03
Speech recognition is increasingly used in medical reporting. The aim of this article is to identify in the literature the strengths and weaknesses of this technology, as well as barriers to and facilitators of its implementation. A systematic review of systematic reviews was performed using PubMed, Scopus, the Cochrane Library and the Center for Reviews and Dissemination through August 2017. The gray literature has also been consulted. The quality of systematic reviews has been assessed with the AMSTAR checklist. The main inclusion criterion was use of speech recognition for medical reporting (front-end or back-end). A survey has also been conducted in Quebec, Canada, to identify the dissemination of this technology in this province, as well as the factors leading to the success or failure of its implementation. Five systematic reviews were identified. These reviews indicated a high level of heterogeneity across studies. The quality of the studies reported was generally poor. Speech recognition is not as accurate as human transcription, but it can dramatically reduce turnaround times for reporting. In front-end use, medical doctors need to spend more time on dictation and correction than required with human transcription. With speech recognition, major errors occur up to three times more frequently. In back-end use, a potential increase in productivity of transcriptionists was noted. In conclusion, speech recognition offers several advantages for medical reporting. However, these advantages are countered by an increased burden on medical doctors and by risks of additional errors in medical reports. It is also hard to identify for which medical specialties and which clinical activities the use of speech recognition will be the most beneficial.
Cleft audit protocol for speech (CAPS-A): a comprehensive training package for speech analysis.
Sell, D; John, A; Harding-Bell, A; Sweeney, T; Hegarty, F; Freeman, J
2009-01-01
The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been paid to this issue. To design, execute, and evaluate a training programme for speech and language therapists on the systematic and reliable use of the Cleft Audit Protocol for Speech-Augmented (CAPS-A), addressing issues of standardized speech samples, data acquisition, recording, playback, and listening guidelines. Thirty-six specialist speech and language therapists undertook the training programme over four days. This consisted of two days' training on the CAPS-A tool followed by a third day, making independent ratings and transcriptions on ten new cases which had been previously recorded during routine audit data collection. This task was repeated on day 4, a minimum of one month later. Ratings were made using the CAPS-A record form with the CAPS-A definition table. An analysis was made of the speech and language therapists' CAPS-A ratings at occasion 1 and occasion 2 and the intra- and inter-rater reliability calculated. Trained therapists showed consistency in individual judgements on specific sections of the tool. Intraclass correlation coefficients were calculated for each section with good agreement on eight of 13 sections. There were only fair levels of agreement on anterior oral cleft speech characteristics, non-cleft errors/immaturities and voice. This was explained, at least in part, by their low prevalence which affects the calculation of the intraclass correlation coefficient statistic. Speech and language therapists benefited from training on the CAPS-A, focusing on specific aspects of speech using definitions of parameters and scalar points, in order to apply the tool systematically and reliably. Ratings are enhanced by ensuring a high degree of attention to the nature of the data, standardizing the speech sample, data acquisition, the listening process together with the use of high-quality recording and playback equipment. In addition, a method is proposed for maintaining listening skills following training as part of an individual's continuing education.
Coding strategies for cochlear implants under adverse environments
NASA Astrophysics Data System (ADS)
Tahmina, Qudsia
Cochlear implants are electronic prosthetic devices that restores partial hearing in patients with severe to profound hearing loss. Although most coding strategies have significantly improved the perception of speech in quite listening conditions, there remains limitations on speech perception under adverse environments such as in background noise, reverberation and band-limited channels, and we propose strategies that improve the intelligibility of speech transmitted over the telephone networks, reverberated speech and speech in the presence of background noise. For telephone processed speech, we propose to examine the effects of adding low-frequency and high- frequency information to the band-limited telephone speech. Four listening conditions were designed to simulate the receiving frequency characteristics of telephone handsets. Results indicated improvement in cochlear implant and bimodal listening when telephone speech was augmented with high frequency information and therefore this study provides support for design of algorithms to extend the bandwidth towards higher frequencies. The results also indicated added benefit from hearing aids for bimodal listeners in all four types of listening conditions. Speech understanding in acoustically reverberant environments is always a difficult task for hearing impaired listeners. Reverberated sounds consists of direct sound, early reflections and late reflections. Late reflections are known to be detrimental to speech intelligibility. In this study, we propose a reverberation suppression strategy based on spectral subtraction to suppress the reverberant energies from late reflections. Results from listening tests for two reverberant conditions (RT60 = 0.3s and 1.0s) indicated significant improvement when stimuli was processed with SS strategy. The proposed strategy operates with little to no prior information on the signal and the room characteristics and therefore, can potentially be implemented in real-time CI speech processors. For speech in background noise, we propose a mechanism underlying the contribution of harmonics to the benefit of electroacoustic stimulations in cochlear implants. The proposed strategy is based on harmonic modeling and uses synthesis driven approach to synthesize the harmonics in voiced segments of speech. Based on objective measures, results indicated improvement in speech quality. This study warrants further work into development of algorithms to regenerate harmonics of voiced segments in the presence of noise.
Treatment for speech disorder in Friedreich ataxia and other hereditary ataxia syndromes.
Vogel, Adam P; Folker, Joanne; Poole, Matthew L
2014-10-28
Hereditary ataxia syndromes can result in significant speech impairment, a symptom thought to be responsive to treatment. The type of speech impairment most commonly reported in hereditary ataxias is dysarthria. Dysarthria is a collective term referring to a group of movement disorders affecting the muscular control of speech. Dysarthria affects the ability of individuals to communicate and to participate in society. This in turn reduces quality of life. Given the harmful impact of speech disorder on a person's functioning, treatment of speech impairment in these conditions is important and evidence-based interventions are needed. To assess the effects of interventions for speech disorder in adults and children with Friedreich ataxia and other hereditary ataxias. On 14 October 2013, we searched the Cochrane Neuromuscular Disease Group Specialized Register, CENTRAL, MEDLINE, EMBASE, CINAHL Plus, PsycINFO, Education Resources Information Center (ERIC), Linguistics and Language Behavior Abstracts (LLBA), Dissertation Abstracts and trials registries. We checked all references in the identified trials to identify any additional published data. We considered for inclusion randomised controlled trials (RCTs) or quasi-RCTs that compared treatments for hereditary ataxias with no treatment, placebo or another treatment or combination of treatments, where investigators measured speech production. Two review authors independently selected trials for inclusion, extracted data and assessed the risk of bias of included studies using the standard methodological procedures expected by The Cochrane Collaboration. The review authors collected information on adverse effects from included studies. We did not conduct a meta-analysis as no two studies utilised the same assessment procedures within the same treatment. Fourteen clinical trials, involving 721 participants, met the criteria for inclusion in the review. Thirteen studies compared a pharmaceutical treatment with placebo (or a low dose of the intervention), in heterogenous groups of degenerative cerebellar ataxias. Three compounds were studied in two trials each: a levorotatory form of 5-hydroxytryptophan (L-5HT), idebenone and thyrotropin-releasing hormone tartrate (TRH-T); each of the other compounds (riluzole, varenicline, buspirone, betamethasone, coenzyme Q10 with vitamin E, α-tocopheryl quinone and erythropoietin) were studied in one trial. The 14th trial, involving a mixed group of participants with spinocerebellar ataxia, compared the effectiveness of nonspecific physiotherapy and occupational therapy within an inpatient hospital setting to no treatment. No studies utilised traditional speech therapies. We defined the primary outcome measure in this review as the percentage change (improvement) in overall speech production immediately following completion of the intervention or later, measured by any validated speech assessment tool. None of the trials included speech as a primary outcome or examined speech using any validated speech assessment tool. Eleven studies reported speech outcomes derived from a subscale embedded within disease rating scales. The remaining three studies used alternative assessments to measure speech, including mean time to produce a standard sentence, a subjective rating of speech on a 14-point analogue scale, patient-reported assessment of the impact of dysarthria on activities of daily living and acoustic measures of syllable length. One study measured speech both subjectively as part of a disease rating scale and with further measures of speech timing. Three studies utilised the Short Form-36 Health Survey (SF-36) and one used the Child Health Questionnaire as measures of general quality of life. A further study utilised the Functional Independence Measure to assess functional health.Five studies reported statistically significant improvement on an overall disease rating scale in which a speech subscale was included. Only three of those studies provided specific data on speech performance; all were comparisons with placebo. Improvements in overall disease severity were observed with α-tocopheryl quinone; however, no significant changes were found on the speech subscale in a group of individuals with Friedreich ataxia. A statistically significant improvement in speech according to a speech disorders subscale was observed with betamethasone. Riluzole was found to have a statistically significant effect on speech in a group of participants with mixed hereditary, sporadic and unknown origin ataxias. No significant differences were observed between treatment and placebo in any other pharmaceutical study. A statistically significant improvement in functional independence occurred at the end of the treatment period in the rehabilitation study compared to the delayed treatment group but these effects were not present 12 to 24 weeks after treatment. Of the four studies that assessed quality of life, none found a significant effect. A variety of minor adverse events were reported for the 13 pharmaceutical therapies, including gastrointestinal side effects and nausea. Serious adverse effects were reported in two participants in one of the L-5HT trials (participants discontinued due to gastrointestinal effects), and in four participants (three taking idebenone, one taking placebo) in the idebenone studies. Serious adverse events with idebenone were gastrointestinal side effects and, in people with a previous history of these events, chest pain and idiopathic thrombocytopenic purpura. The rehabilitation study did not report any adverse events.We considered six studies to be at high risk of bias in some respect. We suspected inadequate blinding of participants or assessors in four studies and poor randomisation in a further two studies. There was a high risk of reporting bias in two studies and attrition bias in four studies. Only one study had a low risk of bias across all criteria. Taken together with other limitations of the studies relating to the validity of the measurement scales used, we downgraded the quality of the evidence for many of the outcomes to low or very low. There is insufficient and low or very low quality evidence from either RCTs or observational studies to determine the effectiveness of any treatment for speech disorder in any of the hereditary ataxia syndromes.
Kushalnagar, P.; Topolski, T. D.; Schick, B.; Edwards, T. C.; Skalicky, A. M.; Patrick, D. L.
2011-01-01
Given the important role of parent–youth communication in adolescent well-being and quality of life, we sought to examine the relationship between specific communication variables and youth perceived quality of life in general and as a deaf or hard-of-hearing (DHH) individual. A convenience sample of 230 youth (mean age = 14.1, standard deviation = 2.2; 24% used sign only, 40% speech only, and 36% sign + speech) was surveyed on communication-related issues, generic and DHH-specific quality of life, and depression symptoms. Higher youth perception of their ability to understand parents’ communication was significantly correlated with perceived quality of life as well as lower reported depressive symptoms and lower perceived stigma. Youth who use speech as their single mode of communication were more likely to report greater stigma associated with being DHH than youth who used both speech and sign. These findings demonstrate the importance of youths’ perceptions of communication with their parents on generic and DHH-specific youth quality of life. PMID:21536686
Self-efficacy and quality of life in adults who stutter.
Carter, Alice; Breen, Lauren; Yaruss, J Scott; Beilby, Janet
2017-12-01
Self-efficacy has emerged as a potential predictor of quality of life for adults who stutter. Research has focused primarily on the positive relationship self-efficacy has to treatment outcomes, but little is known about the relationship between self-efficacy and quality of life for adults who stutter. The purpose of this mixed- methods study is to determine the predictive value of self-efficacy and its relationship to quality of life for adults who stutter. The Self-Efficacy Scale for Adult Stutterers and the Overall Assessment of the Speaker's Experience with Stuttering were administered to 39 adults who stutter, aged 18- 77. Percentage of syllables stuttered was calculated from a conversational speech sample as a measure of stuttered speech frequency. Qualitative interviews with semi-structured probes were conducted with 10 adults and analyzed using thematic analysis to explore the lived experience of adults who stutter. Self-efficacy emerged as a strong positive predictor of quality of life for adults living with a stuttered speech disorder. Stuttered speech frequency was a moderate negative predictor of self-efficacy. Major qualitative themes identified from the interviews with the participants were: encumbrance, self-concept, confidence, acceptance, life-long journey, treatment, and support. Results provide clarity on the predictive value of self-efficacy and its relationship to quality of life and stuttered speech frequency. Findings highlight that the unique life experiences of adults who stutter require a multidimensional approach to the assessment and treatment of stuttered speech disorders. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.
Research in speech communication.
Flanagan, J
1995-10-24
Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.
Na, Sung Dae; Wei, Qun; Seong, Ki Woong; Cho, Jin Ho; Kim, Myoung Nam
2018-01-01
The conventional methods of speech enhancement, noise reduction, and voice activity detection are based on the suppression of noise or non-speech components of the target air-conduction signals. However, air-conduced speech is hard to differentiate from babble or white noise signals. To overcome this problem, the proposed algorithm uses the bone-conduction speech signals and soft thresholding based on the Shannon entropy principle and cross-correlation of air- and bone-conduction signals. A new algorithm for speech detection and noise reduction is proposed, which makes use of the Shannon entropy principle and cross-correlation with the bone-conduction speech signals to threshold the wavelet packet coefficients of the noisy speech. The proposed method can be get efficient result by objective quality measure that are PESQ, RMSE, Correlation, SNR. Each threshold is generated by the entropy and cross-correlation approaches in the decomposed bands using the wavelet packet decomposition. As a result, the noise is reduced by the proposed method using the MATLAB simulation. To verify the method feasibility, we compared the air- and bone-conduction speech signals and their spectra by the proposed method. As a result, high performance of the proposed method is confirmed, which makes it quite instrumental to future applications in communication devices, noisy environment, construction, and military operations.
Speech and orthodontic appliances: a systematic literature review.
Chen, Junyu; Wan, Jia; You, Lun
2018-01-23
Various types of orthodontic appliances can lead to speech difficulties. However, speech difficulties caused by orthodontic appliances have not been sufficiently investigated by an evidence-based method. The aim of this study is to outline the scientific evidence and mechanism of the speech difficulties caused by orthodontic appliances. Randomized-controlled clinical trials (RCT), controlled clinical trials, and cohort studies focusing on the effect of orthodontic appliances on speech were included. A systematic search was conducted by an electronic search in PubMed, EMBASE, and the Cochrane Library databases, complemented by a manual search. The types of orthodontic appliances, the affected sounds, and duration period of the speech disturbances were extracted. The ROBINS-I tool was applied to evaluate the quality of non-randomized studies, and the bias of RCT was assessed based on the Cochrane Handbook for Systematic Reviews of Interventions. No meta-analyses could be performed due to the heterogeneity in the study designs and treatment modalities. Among 448 screened articles, 13 studies were included (n = 297 patients). Different types of orthodontic appliances such as fixed appliances, orthodontic retainers and palatal expanders could influence the clarity of speech. The /i/, /a/, and /e/ vowels as well as /s/, /z/, /l/, /t/, /d/, /r/, and /ʃ/ consonants could be distorted by appliances. Although most speech impairments could return to normal within weeks, speech distortion of the /s/ sound might last for more than 3 months. The low evidence level grading and heterogeneity were the two main limitations in this systematic review. Lingual fixed appliances, palatal expanders, and Hawley retainers have an evident influence on speech production. The /i/, /s/, /t/, and /d/ sounds are the primarily affected ones. The results of this systematic review should be interpreted with caution and more high-quality RCTs with larger sample sizes and longer follow-up periods are needed. The protocol for this systematic review (CRD42017056573) was registered in the International Prospective Register of Systematic Reviews (PROSPERO). © The Author(s) 2017. Published by Oxford University Press on behalf of the European Orthodontic Society. All rights reserved. For permissions, please email: journals.permissions@oup.com
Boë, Louis-Jean; Berthommier, Frédéric; Legou, Thierry; Captier, Guillaume; Kemp, Caralyn; Sawallis, Thomas R.; Becker, Yannick; Rey, Arnaud; Fagot, Joël
2017-01-01
Language is a distinguishing characteristic of our species, and the course of its evolution is one of the hardest problems in science. It has long been generally considered that human speech requires a low larynx, and that the high larynx of nonhuman primates should preclude their producing the vowel systems universally found in human language. Examining the vocalizations through acoustic analyses, tongue anatomy, and modeling of acoustic potential, we found that baboons (Papio papio) produce sounds sharing the F1/F2 formant structure of the human [ɨ æ ɑ ɔ u] vowels, and that similarly with humans those vocalic qualities are organized as a system on two acoustic-anatomic axes. This confirms that hominoids can produce contrasting vowel qualities despite a high larynx. It suggests that spoken languages evolved from ancient articulatory skills already present in our last common ancestor with Cercopithecoidea, about 25 MYA. PMID:28076426
Language Recognition via Sparse Coding
2016-09-08
a posteriori (MAP) adaptation scheme that further optimizes the discriminative quality of sparse-coded speech fea - tures. We empirically validate the...significantly improve the discriminative quality of sparse-coded speech fea - tures. In Section 4, we evaluate the proposed approaches against an i-vector
A comparative analysis of whispered and normally phonated speech using an LPC-10 vocoder
NASA Astrophysics Data System (ADS)
Wilson, J. B.; Mosko, J. D.
1985-12-01
The determination of the performance of an LPC-10 vocoder in the processing of adult male and female whispered and normally phonated connected speech was the focus of this study. The LPC-10 vocoder's analysis of whispered speech compared quite favorably with similar studies which used sound spectrographic processing techniques. Shifting from phonated speech to whispered speech caused a substantial increase in the phonomic formant frequencies and formant bandwidths for both male and female speakers. The data from this study showed no evidence that the LPC-10 vocoder's ability to process voices with pitch extremes and quality extremes was limited in any significant manner. A comparison of the unprocessed natural vowel waveforms and qualities with the synthesized vowel waveforms and qualities revealed almost imperceptible differences. An LPC-10 vocoder's ability to process linguistic and dialectical suprasegmental features such as intonation, rate and stress at low bit rates should be a critical issue of concern for future research.
Speech Intelligibility and Personality Peer-Ratings of Young Adults with Cochlear Implants
ERIC Educational Resources Information Center
Freeman, Valerie
2018-01-01
Speech intelligibility, or how well a speaker's words are understood by others, affects listeners' judgments of the speaker's competence and personality. Deaf cochlear implant (CI) users vary widely in speech intelligibility, and their speech may have a noticeable "deaf" quality, both of which could evoke negative stereotypes or…
Casto, Kristen L; Casali, John G
2013-06-01
This study was designed to determine the effects of hearing loss, aviation headset type, flight workload complexity, and communication signal quality on pilots' performance in an army rotary-wing flight simulator. To maintain flight status, army aviators who do not meet current audiometric standards require a hearing loss waiver, which is based on speech intelligibility in quiet conditions. Because hearing loss characteristics of hearing-impaired aviators can vary greatly, and because performance is likely also influenced by degree of flight workload and communication demand, it was expected that performance among hearing-impaired aviators would also vary. Participants were 20 army helicopter pilots. Pilots flew three flights in a full motion-based helicopter simulator,with a different headset configuration and varying flight workload levels and communication signal quality characterizing each flight. Objective flight performance parameters of heading, altitude, and airspeed deviation and air traffic control command read-backs were measured. Statistically significant results suggest that high levels of flight workload, especially in combination with poor communications signal quality, lead to deficits in flight performance and speech intelligibility. These results support a conclusion that factors other than hearing thresholds and speech intelligibility in quiet should be considered when evaluating helicopter pilots' flight safety. The results also support a recommendation that hearing-impaired pilots use assistive communication technology and not fly with strictly passive headsets. The combined effects of flight environment with individual hearing levels should be considered when making recommendations concerning continued aviation flight status and those concerning communications headsets used in high-noise cockpits.
Blöte, Anke W; Miers, Anne C; Van den Bos, Esther; Westenberg, P Michiel
2018-05-17
Cognitive behavioural therapy (CBT) has relatively poor outcomes for youth with social anxiety, possibly because broad-based CBT is not tailored to their specific needs. Treatment of social anxiety in youth may need to pay more attention to negative social cognitions that are considered a key factor in social anxiety development and maintenance. The aim of the present study was to learn more about the role of performance quality in adolescents' cognitions about their social performance and, in particular, the moderating role social anxiety plays in the relationship between performance quality and self-cognitions. A community sample of 229 participants, aged 11 to 18 years, gave a speech and filled in questionnaires addressing social anxiety, depression, expected and self-evaluated performance, and post-event rumination. Independent observers rated the quality of the speech. The data were analysed using moderated mediation analysis. Performance quality mediated the link between expected and self-evaluated performance in adolescents with low and medium levels of social anxiety. For adolescents with high levels of social anxiety, only a direct link between expected and self-evaluated performance was found. Their self-evaluation was not related to the quality of their performance. Performance quality also mediated the link between expected performance and rumination, but social anxiety did not moderate this mediation effect. Results suggest that a good performance does not help socially anxious adolescents to replace their negative self-evaluations with more realistic ones. Specific cognitive intervention strategies should be tailored to the needs of socially anxious adolescents who perform well.
Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices
Falk, Tiago H.; Parsa, Vijay; Santos, João F.; Arehart, Kathryn; Hazrati, Oldooz; Huber, Rainer; Kates, James M.; Scollie, Susan
2015-01-01
This article presents an overview of twelve existing objective speech quality and intelligibility prediction tools. Two classes of algorithms are presented, namely intrusive and non-intrusive, with the former requiring the use of a reference signal, while the latter does not. Investigated metrics include both those developed for normal hearing listeners, as well as those tailored particularly for hearing impaired (HI) listeners who are users of assistive listening devices (i.e., hearing aids, HAs, and cochlear implants, CIs). Representative examples of those optimized for HI listeners include the speech-to-reverberation modulation energy ratio, tailored to hearing aids (SRMR-HA) and to cochlear implants (SRMR-CI); the modulation spectrum area (ModA); the hearing aid speech quality (HASQI) and perception indices (HASPI); and the PErception MOdel - hearing impairment quality (PEMO-Q-HI). The objective metrics are tested on three subjectively-rated speech datasets covering reverberation-alone, noise-alone, and reverberation-plus-noise degradation conditions, as well as degradations resultant from nonlinear frequency compression and different speech enhancement strategies. The advantages and limitations of each measure are highlighted and recommendations are given for suggested uses of the different tools under specific environmental and processing conditions. PMID:26052190
Miyoshi, Masayuki; Fukuhara, Takahiro; Kataoka, Hideyuki; Hagino, Hiroshi
2016-04-01
The use of tracheoesophageal speech with voice prosthesis (T-E speech) after total laryngectomy has increased recently as a method of vocalization following laryngeal cancer. Previous research has not investigated the relationship between quality of life (QOL) and phonatory function in those using T-E speech. This study aimed to demonstrate the relationship between phonatory function and both comprehensive health-related QOL and QOL related to speech in people using T-E speech. The subjects of the study were 20 male patients using T-E speech after total laryngectomy. At a visit to our clinic, the subjects underwent a phonatory function test and completed three questionnaires: the MOS 8-Item Short-Form Health Survey (SF-8), the Voice Handicap Index-10 (VHI-10), and the Voice-Related Quality of Life (V-RQOL) Measure. A significant correlation was observed between the physical component summary (PCS), a summary score of SF-8, and VHI-10. Additionally, a significant correlation was observed between the SF-8 mental component summary (MCS) and both VHI-10 and VRQOL. Significant correlations were also observed between voice intensity in the phonatory function test and both VHI-10 and V-RQOL. Finally, voice intensity was significantly correlated with the SF-8 PCS. QOL questionnaires and phonatory function tests showed that, in people using T-E speech after total laryngectomy, voice intensity was correlated with comprehensive QOL, including physical and mental health. This finding suggests that voice intensity can be used as a performance index for speech rehabilitation.
[Effects of acaoustic adaptation of classrooms on the quality of verbal communication].
Mikulski, Witold
2013-01-01
Voice organ disorders among teachers are caused by excessive voice strain. One of the measures to reduce this strain is to decrease background noise when teaching. Increasing the acoustic absorption of the room is a technical measure for achieving this aim. The absorption level also improves speech intelligibility rated by the following parameters: room reverberation time and speech transmission index (STI). This article presents the effects of acoustic adaptation of classrooms on the quality of verbal communication, aimed at getting the speech intelligibility at the good or excellent level. The article lists the criteria for evaluating classrooms in terms of the quality of verbal communication. The parameters were defined, using the measurement methods according to PN-EN ISO 3382-2:2010 and PN-EN 60268-16:2011. Acoustic adaptations were completed in two classrooms. After completing acoustic adaptations the reverberation time for the frequency of 1 kHz was reduced: in room no. 1 from 1.45 s to 0.44 s and in room no. 2 from 1.03 s to 0.37 s (maximum 0.65 s). At the same time, the speech transmission index increased: in room no. 1 from 0.55 (satisfactory speech intelligibility) to 0.75 (speech intelligibility close to excellent); in room no. 2 from 0.63 (good speech intelligibility) to 0.80 (excellent speech intelligibility). Therefore, it can be stated that prior to completing acoustic adaptations room no. 1 did not comply and room no. 2 barely complied with the criterion (speech transmission index of 0.62). After completing acoustic adaptations both rooms meet the requirements.
Development of a good-quality speech coder for transmission over noisy channels at 2.4 kb/s
NASA Astrophysics Data System (ADS)
Viswanathan, V. R.; Berouti, M.; Higgins, A.; Russell, W.
1982-03-01
This report describes the development, study, and experimental results of a 2.4 kb/s speech coder called harmonic deviations (HDV) vocoder, which transmits good-quality speech over noisy channels with bit-error rates of up to 1%. The HDV coder is based on the linear predictive coding (LPC) vocoder, and it transmits additional information over and above the data transmitted by the LPC vocoder, in the form of deviations between the speech spectrum and the LPC all-pole model spectrum at a selected set of frequencies. At the receiver, the spectral deviations are used to generate the excitation signal for the all-pole synthesis filter. The report describes and compares several methods for extracting the spectral deviations from the speech signal and for encoding them. To limit the bit-rate of the HDV coder to 2.4 kb/s the report discusses several methods including orthogonal transformation and minimum-mean-square-error scalar quantization of log area ratios, two-stage vector-scalar quantization, and variable frame rate transmission. The report also presents the results of speech-quality optimization of the HDV coder at 2.4 kb/s.
A variable rate speech compressor for mobile applications
NASA Technical Reports Server (NTRS)
Yeldener, S.; Kondoz, A. M.; Evans, B. G.
1990-01-01
One of the most promising speech coder at the bit rate of 9.6 to 4.8 kbits/s is CELP. Code Excited Linear Prediction (CELP) has been dominating 9.6 to 4.8 kbits/s region during the past 3 to 4 years. Its set back however, is its expensive implementation. As an alternative to CELP, the Base-Band CELP (CELP-BB) was developed which produced good quality speech comparable to CELP and a single chip implementable complexity as reported previously. Its robustness was also improved to tolerate errors up to 1.0 pct. and maintain intelligibility up to 5.0 pct. and more. Although, CELP-BB produces good quality speech at around 4.8 kbits/s, it has a fundamental problem when updating the pitch filter memory. A sub-optimal solution is proposed for this problem. Below 4.8 kbits/s, however, CELP-BB suffers from noticeable quantization noise as a result of the large vector dimensions used. Efficient representation of speech below 4.8 kbits/s is reported by introducing Sinusoidal Transform Coding (STC) to represent the LPC excitation which is called Sine Wave Excited LPC (SWELP). In this case, natural sounding good quality synthetic speech is obtained at around 2.4 kbits/s.
Morris, Meg E; Erickson, Shane; Serry, Tanya A
2016-01-01
Background Although mobile apps are readily available for speech sound disorders (SSD), their validity has not been systematically evaluated. This evidence-based appraisal will critically review and synthesize current evidence on available therapy apps for use by children with SSD. Objective The main aims are to (1) identify the types of apps currently available for Android and iOS mobile phones and tablets, and (2) to critique their design features and content using a structured quality appraisal tool. Methods This protocol paper presents and justifies the methods used for a systematic review of mobile apps that provide intervention for use by children with SSD. The primary outcomes of interest are (1) engagement, (2) functionality, (3) aesthetics, (4) information quality, (5) subjective quality, and (6) perceived impact. Quality will be assessed by 2 certified practicing speech-language pathologists using a structured quality appraisal tool. Two app stores will be searched from the 2 largest operating platforms, Android and iOS. Systematic methods of knowledge synthesis shall include searching the app stores using a defined procedure, data extraction, and quality analysis. Results This search strategy shall enable us to determine how many SSD apps are available for Android and for iOS compatible mobile phones and tablets. It shall also identify the regions of the world responsible for the apps’ development, the content and the quality of offerings. Recommendations will be made for speech-language pathologists seeking to use mobile apps in their clinical practice. Conclusions This protocol provides a structured process for locating apps and appraising the quality, as the basis for evaluating their use in speech pathology for children in English-speaking nations. PMID:27899341
Furlong, Lisa M; Morris, Meg E; Erickson, Shane; Serry, Tanya A
2016-11-29
Although mobile apps are readily available for speech sound disorders (SSD), their validity has not been systematically evaluated. This evidence-based appraisal will critically review and synthesize current evidence on available therapy apps for use by children with SSD. The main aims are to (1) identify the types of apps currently available for Android and iOS mobile phones and tablets, and (2) to critique their design features and content using a structured quality appraisal tool. This protocol paper presents and justifies the methods used for a systematic review of mobile apps that provide intervention for use by children with SSD. The primary outcomes of interest are (1) engagement, (2) functionality, (3) aesthetics, (4) information quality, (5) subjective quality, and (6) perceived impact. Quality will be assessed by 2 certified practicing speech-language pathologists using a structured quality appraisal tool. Two app stores will be searched from the 2 largest operating platforms, Android and iOS. Systematic methods of knowledge synthesis shall include searching the app stores using a defined procedure, data extraction, and quality analysis. This search strategy shall enable us to determine how many SSD apps are available for Android and for iOS compatible mobile phones and tablets. It shall also identify the regions of the world responsible for the apps' development, the content and the quality of offerings. Recommendations will be made for speech-language pathologists seeking to use mobile apps in their clinical practice. This protocol provides a structured process for locating apps and appraising the quality, as the basis for evaluating their use in speech pathology for children in English-speaking nations. ©Lisa M Furlong, Meg E Morris, Shane Erickson, Tanya A Serry. Originally published in JMIR Research Protocols (http://www.researchprotocols.org), 29.11.2016.
Communication after laryngectomy: an assessment of quality of life.
Carr, M M; Schmidbauer, J A; Majaess, L; Smith, R L
2000-01-01
The purpose of this study was to examine quality of life in laryngectomees using different methods of communication. A survey was mailed to all the living laryngectomees in Nova Scotia. Patients were asked to rate their ability to communicate in a number of common situations, to rate their difficulty with several communication problems, and to complete the EORTC QLQ-C30 quality-of-life assessment tool. Sixty-two patients responded (return rate of 84%); 57% were using electrolaryngeal speech, 19% esophageal speech, and 8.5% tracheoesophageal speech. These groups were comparable with respect to age, sex, first language, education level, and years since laryngectomy. There were very few differences between these groups in ability to communicate in social situations and no difference in overall quality of life as measured by these scales. The most commonly cited problem was difficulty being heard in a noisy environment. Despite the fact that tracheoesophageal speech is objectively most intelligible, there does not seem to be a measurable improvement in quality of life or ability to communicate in everyday situations over electrolaryngeal or esophageal speakers.
Dietrich, Susanne; Hertrich, Ingo; Müller-Dahlhaus, Florian; Ackermann, Hermann; Belardinelli, Paolo; Desideri, Debora; Seibold, Verena C; Ziemann, Ulf
2018-01-01
The pre-supplementary motor area (pre-SMA) is engaged in speech comprehension under difficult circumstances such as poor acoustic signal quality or time-critical conditions. Previous studies found that left pre-SMA is activated when subjects listen to accelerated speech. Here, the functional role of pre-SMA was tested for accelerated speech comprehension by inducing a transient "virtual lesion" using continuous theta-burst stimulation (cTBS). Participants were tested (1) prior to (pre-baseline), (2) 10 min after (test condition for the cTBS effect), and (3) 60 min after stimulation (post-baseline) using a sentence repetition task (formant-synthesized at rates of 8, 10, 12, 14, and 16 syllables/s). Speech comprehension was quantified by the percentage of correctly reproduced speech material. For high speech rates, subjects showed decreased performance after cTBS of pre-SMA. Regarding the error pattern, the number of incorrect words without any semantic or phonological similarity to the target context increased, while related words decreased. Thus, the transient impairment of pre-SMA seems to affect its inhibitory function that normally eliminates erroneous speech material prior to speaking or, in case of perception, prior to encoding into a semantically/pragmatically meaningful message.
Popova, Svetlana; Lange, Shannon; Burd, Larry; Shield, Kevin; Rehm, Jürgen
2014-12-01
This study, which is part of a large economic project on the overall burden and cost associated with Foetal Alcohol Spectrum Disorder (FASD) in Canada, estimated the cost of 1:1 speech-language interventions among children and youth with FASD for Canada in 2011. The number of children and youth with FASD and speech-language disorder(s) (SLD), the distribution of the level of severity, and the number of hours needed to treat were estimated using data from the available literature. 1:1 speech-language interventions were computed using the average cost per hour for speech-language pathologists. It was estimated that ˜ 37,928 children and youth with FASD had SLD in Canada in 2011. Using the most conservative approach, the annual cost of 1:1 speech-language interventions among children and youth with FASD is substantial, ranging from $72.5 million to $144.1 million Canadian dollars. Speech-language pathologists should be aware of the disproportionate number of children and youth with FASD who have SLD and the need for early identification to improve access to early intervention. Early identification and access to high quality services may have a role in decreasing the risk of developing the secondary disabilities and in reducing the economic burden of FASD on society.
Dietrich, Susanne; Hertrich, Ingo; Müller-Dahlhaus, Florian; Ackermann, Hermann; Belardinelli, Paolo; Desideri, Debora; Seibold, Verena C.; Ziemann, Ulf
2018-01-01
The pre-supplementary motor area (pre-SMA) is engaged in speech comprehension under difficult circumstances such as poor acoustic signal quality or time-critical conditions. Previous studies found that left pre-SMA is activated when subjects listen to accelerated speech. Here, the functional role of pre-SMA was tested for accelerated speech comprehension by inducing a transient “virtual lesion” using continuous theta-burst stimulation (cTBS). Participants were tested (1) prior to (pre-baseline), (2) 10 min after (test condition for the cTBS effect), and (3) 60 min after stimulation (post-baseline) using a sentence repetition task (formant-synthesized at rates of 8, 10, 12, 14, and 16 syllables/s). Speech comprehension was quantified by the percentage of correctly reproduced speech material. For high speech rates, subjects showed decreased performance after cTBS of pre-SMA. Regarding the error pattern, the number of incorrect words without any semantic or phonological similarity to the target context increased, while related words decreased. Thus, the transient impairment of pre-SMA seems to affect its inhibitory function that normally eliminates erroneous speech material prior to speaking or, in case of perception, prior to encoding into a semantically/pragmatically meaningful message. PMID:29896086
Auditory Brainstem Response to Complex Sounds Predicts Self-Reported Speech-in-Noise Performance
ERIC Educational Resources Information Center
Anderson, Samira; Parbery-Clark, Alexandra; White-Schwoch, Travis; Kraus, Nina
2013-01-01
Purpose: To compare the ability of the auditory brainstem response to complex sounds (cABR) to predict subjective ratings of speech understanding in noise on the Speech, Spatial, and Qualities of Hearing Scale (SSQ; Gatehouse & Noble, 2004) relative to the predictive ability of the Quick Speech-in-Noise test (QuickSIN; Killion, Niquette,…
Developing a corpus of spoken language variability
NASA Astrophysics Data System (ADS)
Carmichael, Lesley; Wright, Richard; Wassink, Alicia Beckford
2003-10-01
We are developing a novel, searchable corpus as a research tool for investigating phonetic and phonological phenomena across various speech styles. Five speech styles have been well studied independently in previous work: reduced (casual), careful (hyperarticulated), citation (reading), Lombard effect (speech in noise), and ``motherese'' (child-directed speech). Few studies to date have collected a wide range of styles from a single set of speakers, and fewer yet have provided publicly available corpora. The pilot corpus includes recordings of (1) a set of speakers participating in a variety of tasks designed to elicit the five speech styles, and (2) casual peer conversations and wordlists to illustrate regional vowels. The data include high-quality recordings and time-aligned transcriptions linked to text files that can be queried. Initial measures drawn from the database provide comparison across speech styles along the following acoustic dimensions: MLU (changes in unit duration); relative intra-speaker intensity changes (mean and dynamic range); and intra-speaker pitch values (minimum, maximum, mean, range). The corpus design will allow for a variety of analyses requiring control of demographic and style factors, including hyperarticulation variety, disfluencies, intonation, discourse analysis, and detailed spectral measures.
Knez Ambrožič, Mojca; Hočevar Boltežar, Irena; Ihan Hren, Nataša
2015-09-01
Skeletal anterior open bite (AOB) or apertognathism is characterized by the absence of contact of the anterior teeth and affects articulation parameters, chewing, biting and voice quality. The treatment of AOB consists of orthognatic surgical procedures. The aim of this study was to evaluate the effects of treatment on voice quality, articulation and nasality in speech with respect to skeletal changes. The study was prospective; 15 patients with AOB were evaluated before and after surgery. Lateral cephalometric x-ray parameters (facial angle, interincisal distance, Wits appraisal) were measured to determine skeletal changes. Before surgery, nine patients still had articulation disorders despite speech therapy during childhood. The voice quality parameters were determined by acoustic analysis of the vowel sound /a/ (fundamental frequency-F0, jitter, shimmer). Spectral analysis of vowels /a/, /e/, /i/, /o/, /u/ was carried out by determining the mean frequency of the first (F1) and second (F2) formants. Nasality in speech was expressed as the ratio between the nasal and the oral sound energies during speech samples. After surgery, normalizations of facial skeletal parameters were observed in all patients, but no statistically significant changes in articulation and voice quality parameters occurred despite subjective observations of easier articulation. Any deterioration in velopharyngeal insufficiency was absent in all of the patients. In conclusion, the surgical treatment of skeletal AOB does not lead to deterioration in voice, resonance and articulation qualities. Despite surgical correction of the unfavourable skeletal situation of the speech apparatus, the pre-existing articulation disorder cannot improve without professional intervention.
Research in speech communication.
Flanagan, J
1995-01-01
Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker. Images Fig. 1 Fig. 2 Fig. 5 Fig. 8 Fig. 11 Fig. 12 Fig. 13 PMID:7479806
Influence of Security Mechanisms on the Quality of Service of VoIP
NASA Astrophysics Data System (ADS)
Backs, Peter; Pohlmann, Norbert
While Voice over IP (VoIP) is advancing rapidly in the telecommunications market, the interest to protect the data transmitted by this new service is also rising. However, in contrast to other internet services such as email or HTTP, VoIP is real-time media, and therefore must meet a special requirement referred to as Quality-of-Service to provide a comfortable flow of speech. Speech quality is worsened when transmitted over the network due to delays in transmission or loss of packets. Often, voice quality is at a level that even prevents comprehensive dialog. Therefore, an administrator who is to setup a VoIP infrastructure might consider avoiding additional decreases in voice quality resulting from security mechanisms, and might leave internet telephony unprotected as a result. The inspiration for this paper is to illustrate that security mechanisms have negligible impact on speech quality and should in fact be encouraged.
Wylezinska, Marzena; Pinkstone, Marie; Hay, Norman; Scott, Andrew D; Birch, Malcolm J; Miquel, Marc E
2015-12-01
The aim of this work was to investigate the effects of commonly used orthodontic appliances on the magnetic resonance (MR) image quality of the craniofacial region, with special interest in the soft palate and velopharyngeal wall using real-time speech imaging sequences and anatomical imaging of the temporomandibular joints (TMJ) and pituitaries. Common orthodontic appliances were studied on 1.5 T scanner using standard spin and gradient echo sequences (based on the American Society for Testing and Materials standard test method) and sequences previously applied for high-resolution anatomical and dynamic real-time imaging during speech. Images were evaluated for the presence and size of artefacts. Metallic orthodontic appliances had different effects on image quality. The most extensive individual effects were associated with the presence of stainless steel archwire, particularly if combined with stainless steel brackets and stainless steel molar bands. With those appliances, diagnostic quality of magnetic resonance imaging speech and palate images will be most likely severely degraded, or speech imaging and imaging of pituitaries and TMJ will be not possible. All non-metallic, non-metallic with Ni/Cr reinforcement or Ni/Ti alloys appliances were of little concern. The results in the study are only valid at 1.5 T and for the sequences and devices used and cannot necessarily be extrapolated to all sequences and devices. Furthermore, both geometry and size of some appliances are subject dependent, and consequently, the effects on the image quality can vary between subjects. Therefore, the results presented in this article should be treated as a guide when assessing the risks of image quality degradation rather than an absolute evaluation of possible artefacts. Appliances manufactured from stainless steel cause extensive artefacts, which may render image non-diagnostic. The presence and type of orthodontic appliances should be always included in the patient's screening, so the risks of artefacts can be assessed prior to imaging. Although the risks to patients with fixed orthodontic appliances at 1.5 T MR scanners are low, their secure attachment should be confirmed prior to the examination. © The Author 2015. Published by Oxford University Press on behalf of the European Orthodontic Society. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Speech and Speech-Related Quality of Life After Late Palate Repair: A Patient's Perspective.
Schönmeyr, Björn; Wendby, Lisa; Sharma, Mitali; Jacobson, Lia; Restrepo, Carolina; Campbell, Alex
2015-07-01
Many patients with cleft palate deformities worldwide receive treatment at a later age than is recommended for normal speech to develop. The outcomes after late palate repairs in terms of speech and quality of life (QOL) still remain largely unstudied. In the current study, questionnaires were used to assess the patients' perception of speech and QOL before and after primary palate repair. All of the patients were operated at a cleft center in northeast India and had a cleft palate with a normal lip or with a cleft lip that had been previously repaired. A total of 134 patients (7-35 years) were interviewed preoperatively and 46 patients (7-32 years) were assessed in the postoperative survey. The survey showed that scores based on the speech handicap index, concerning speech and speech-related QOL, did not improve postoperatively. In fact, the questionnaires indicated that the speech became more unpredictable (P < 0.01) and that nasal regurgitation became worse (P < 0.01) for some patients after surgery. A total of 78% of the patients were still satisfied with the surgery and all of the patients reported that their self-confidence had improved after the operation. Thus, the majority of interviewed patients who underwent late primary palate repair were satisfied with the surgery. At the same time, speech and speech-related QOL did not improve according to the speech handicap index-based survey. Speech predictability may even become worse and nasal regurgitation may increase after late palate repair, according to these results.
The role of hearing ability and speech distortion in the facilitation of articulatory motor cortex.
Nuttall, Helen E; Kennedy-Higgins, Daniel; Devlin, Joseph T; Adank, Patti
2017-01-08
Excitability of articulatory motor cortex is facilitated when listening to speech in challenging conditions. Beyond this, however, we have little knowledge of what listener-specific and speech-specific factors engage articulatory facilitation during speech perception. For example, it is unknown whether speech motor activity is independent or dependent on the form of distortion in the speech signal. It is also unknown if speech motor facilitation is moderated by hearing ability. We investigated these questions in two experiments. We applied transcranial magnetic stimulation (TMS) to the lip area of primary motor cortex (M1) in young, normally hearing participants to test if lip M1 is sensitive to the quality (Experiment 1) or quantity (Experiment 2) of distortion in the speech signal, and if lip M1 facilitation relates to the hearing ability of the listener. Experiment 1 found that lip motor evoked potentials (MEPs) were larger during perception of motor-distorted speech that had been produced using a tongue depressor, and during perception of speech presented in background noise, relative to natural speech in quiet. Experiment 2 did not find evidence of motor system facilitation when speech was presented in noise at signal-to-noise ratios where speech intelligibility was at 50% or 75%, which were significantly less severe noise levels than used in Experiment 1. However, there was a significant interaction between noise condition and hearing ability, which indicated that when speech stimuli were correctly classified at 50%, speech motor facilitation was observed in individuals with better hearing, whereas individuals with relatively worse but still normal hearing showed more activation during perception of clear speech. These findings indicate that the motor system may be sensitive to the quantity, but not quality, of degradation in the speech signal. Data support the notion that motor cortex complements auditory cortex during speech perception, and point to a role for the motor cortex in compensating for differences in hearing ability. Copyright © 2016 Elsevier Ltd. All rights reserved.
IEP goals for school-age children with speech sound disorders.
Farquharson, Kelly; Tambyraja, Sherine R; Justice, Laura M; Redle, Erin E
2014-01-01
The purpose of the current study was to describe the current state of practice for writing Individualized Education Program (IEP) goals for children with speech sound disorders (SSDs). IEP goals for 146 children receiving services for SSDs within public school systems across two states were coded for their dominant theoretical framework and overall quality. A dichotomous scheme was used for theoretical framework coding: cognitive-linguistic or sensory-motor. Goal quality was determined by examining 7 specific indicators outlined by an empirically tested rating tool. In total, 147 long-term and 490 short-term goals were coded. The results revealed no dominant theoretical framework for long-term goals, whereas short-term goals largely reflected a sensory-motor framework. In terms of quality, the majority of speech production goals were functional and generalizable in nature, but were not able to be easily targeted during common daily tasks or by other members of the IEP team. Short-term goals were consistently rated higher in quality domains when compared to long-term goals. The current state of practice for writing IEP goals for children with SSDs indicates that theoretical framework may be eclectic in nature and likely written to support the individual needs of children with speech sound disorders. Further investigation is warranted to determine the relations between goal quality and child outcomes. (1) Identify two predominant theoretical frameworks and discuss how they apply to IEP goal writing. (2) Discuss quality indicators as they relate to IEP goals for children with speech sound disorders. (3) Discuss the relationship between long-term goals level of quality and related theoretical frameworks. (4) Identify the areas in which business-as-usual IEP goals exhibit strong quality.
Inner Speech and Clarity of Self-Concept in Thought Disorder and Auditory-Verbal Hallucinations
de Sousa, Paulo; Sellwood, William; Spray, Amy; Fernyhough, Charles; Bentall, Richard P.
2016-01-01
Abstract Eighty patients and thirty controls were interviewed using one interview that promoted personal disclosure and another about everyday topics. Speech was scored using the Thought, Language and Communication scale (TLC). All participants completed the Self-Concept Clarity Scale (SCCS) and the Varieties of Inner Speech Questionnaire (VISQ). Patients scored lower than comparisons on the SCCS. Low scores were associated the disorganized dimension of TD. Patients also scored significantly higher on condensed and other people in inner speech, but not on dialogical or evaluative inner speech. The poverty of speech dimension of TD was associated with less dialogical inner speech, other people in inner speech, and less evaluative inner speech. Hallucinations were significantly associated with more other people in inner speech and evaluative inner speech. Clarity of self-concept and qualities of inner speech are differentially associated with dimensions of TD. The findings also support inner speech models of hallucinations. PMID:27898489
Inner Speech and Clarity of Self-Concept in Thought Disorder and Auditory-Verbal Hallucinations.
de Sousa, Paulo; Sellwood, William; Spray, Amy; Fernyhough, Charles; Bentall, Richard P
2016-12-01
Eighty patients and thirty controls were interviewed using one interview that promoted personal disclosure and another about everyday topics. Speech was scored using the Thought, Language and Communication scale (TLC). All participants completed the Self-Concept Clarity Scale (SCCS) and the Varieties of Inner Speech Questionnaire (VISQ). Patients scored lower than comparisons on the SCCS. Low scores were associated the disorganized dimension of TD. Patients also scored significantly higher on condensed and other people in inner speech, but not on dialogical or evaluative inner speech. The poverty of speech dimension of TD was associated with less dialogical inner speech, other people in inner speech, and less evaluative inner speech. Hallucinations were significantly associated with more other people in inner speech and evaluative inner speech. Clarity of self-concept and qualities of inner speech are differentially associated with dimensions of TD. The findings also support inner speech models of hallucinations.
A recursive linear predictive vocoder
NASA Astrophysics Data System (ADS)
Janssen, W. A.
1983-12-01
A non-real time 10 pole recursive autocorrelation linear predictive coding vocoder was created for use in studying effects of recursive autocorrelation on speech. The vocoder is composed of two interchangeable pitch detectors, a speech analyzer, and speech synthesizer. The time between updating filter coefficients is allowed to vary from .125 msec to 20 msec. The best quality was found using .125 msec between each update. The greatest change in quality was noted when changing from 20 msec/update to 10 msec/update. Pitch period plots for the center clipping autocorrelation pitch detector and simplified inverse filtering technique are provided. Plots of speech into and out of the vocoder are given. Formant versus time three dimensional plots are shown. Effects of noise on pitch detection and formants are shown. Noise effects the voiced/unvoiced decision process causing voiced speech to be re-constructed as unvoiced.
Telephony-based voice pathology assessment using automated speech analysis.
Moran, Rosalyn J; Reilly, Richard B; de Chazal, Philip; Lacy, Peter D
2006-03-01
A system for remotely detecting vocal fold pathologies using telephone-quality speech is presented. The system uses a linear classifier, processing measurements of pitch perturbation, amplitude perturbation and harmonic-to-noise ratio derived from digitized speech recordings. Voice recordings from the Disordered Voice Database Model 4337 system were used to develop and validate the system. Results show that while a sustained phonation, recorded in a controlled environment, can be classified as normal or pathologic with accuracy of 89.1%, telephone-quality speech can be classified as normal or pathologic with an accuracy of 74.2%, using the same scheme. Amplitude perturbation features prove most robust for telephone-quality speech. The pathologic recordings were then subcategorized into four groups, comprising normal, neuromuscular pathologic, physical pathologic and mixed (neuromuscular with physical) pathologic. A separate classifier was developed for classifying the normal group from each pathologic subcategory. Results show that neuromuscular disorders could be detected remotely with an accuracy of 87%, physical abnormalities with an accuracy of 78% and mixed pathology voice with an accuracy of 61%. This study highlights the real possibility for remote detection and diagnosis of voice pathology.
Perez, Hector R.; Stoeckle, James H.
2016-01-01
Abstract Objective To provide an update on the epidemiology, genetics, pathophysiology, diagnosis, and treatment of developmental stuttering. Quality of evidence The MEDLINE and Cochrane databases were searched for past and recent studies on the epidemiology, genetics, pathophysiology, diagnosis, and treatment of developmental stuttering. Most recommendations are based on small studies, limited-quality evidence, or consensus. Main message Stuttering is a speech disorder, common in persons of all ages, that affects normal fluency and time patterning of speech. Stuttering has been associated with differences in brain anatomy, functioning, and dopamine regulation thought to be due to genetic causes. Attention to making a correct diagnosis or referral in children is important because there is growing consensus that early intervention with speech therapy for children who stutter is critical. For adults, stuttering can be associated with substantial psychosocial morbidity including social anxiety and low quality of life. Pharmacologic treatment has received attention in recent years, but clinical evidence is limited. The mainstay of treatment for children and adults remains speech therapy. Conclusion A growing body of research has attempted to uncover the pathophysiology of stuttering. Referral for speech therapy remains the best option for children and adults. PMID:27303004
Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A.
2015-01-01
Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests. Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study. Forty-four listeners aged between 50 and 74 years with mild sensorineural hearing loss were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet), to medium (digit triplet perception in speech-shaped noise) to high (sentence perception in modulated noise); cognitive tests of attention, memory, and non-verbal intelligence quotient; and self-report questionnaires of general health-related and hearing-specific quality of life. Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that auditory environments pose on functioning. PMID:26136699
Távora-Vieira, Dayse; Marino, Roberta; Acharya, Aanand; Rajan, Gunesh P
2015-03-01
This study aimed to determine the impact of cochlear implantation on speech understanding in noise, subjective perception of hearing, and tinnitus perception of adult patients with unilateral severe to profound hearing loss and to investigate whether duration of deafness and age at implantation would influence the outcomes. In addition, this article describes the auditory training protocol used for unilaterally deaf patients. This is a prospective study of subjects undergoing cochlear implantation for unilateral deafness with or without associated tinnitus. Speech perception in noise was tested using the Bamford-Kowal-Bench speech-in-noise test presented at 65 dB SPL. The Speech, Spatial, and Qualities of Hearing Scale and the Abbreviated Profile of Hearing Aid Benefit were used to evaluate the subjective perception of hearing with a cochlear implant and quality of life. Tinnitus disturbance was measured using the Tinnitus Reaction Questionnaire. Data were collected before cochlear implantation and 3, 6, 12, and 24 months after implantation. Twenty-eight postlingual unilaterally deaf adults with or without tinnitus were implanted. There was a significant improvement in speech perception in noise across time in all spatial configurations. There was an overall significant improvement on the subjective perception of hearing and quality of life. Tinnitus disturbance reduced significantly across time. Age at implantation and duration of deafness did not influence the outcomes significantly. Cochlear implantation provided significant improvement in speech understanding in challenging situations, subjective perception of hearing performance, and quality of life. Cochlear implantation also resulted in reduced tinnitus disturbance. Age at implantation and duration of deafness did not seem to influence the outcomes.
Effects of Compression on Speech Acoustics, Intelligibility, and Sound Quality
Souza, Pamela E.
2002-01-01
The topic of compression has been discussed quite extensively in the last 20 years (eg, Braida et al., 1982; Dillon, 1996, 2000; Dreschler, 1992; Hickson, 1994; Kuk, 2000 and 2002; Kuk and Ludvigsen, 1999; Moore, 1990; Van Tasell, 1993; Venema, 2000; Verschuure et al., 1996; Walker and Dillon, 1982). However, the latest comprehensive update by this journal was published in 1996 (Kuk, 1996). Since that time, use of compression hearing aids has increased dramatically, from half of hearing aids dispensed only 5 years ago to four out of five hearing aids dispensed today (Strom, 2002b). Most of today's digital and digitally programmable hearing aids are compression devices (Strom, 2002a). It is probable that within a few years, very few patients will be fit with linear hearing aids. Furthermore, compression has increased in complexity, with greater numbers of parameters under the clinician's control. Ideally, these changes will translate to greater flexibility and precision in fitting and selection. However, they also increase the need for information about the effects of compression amplification on speech perception and speech quality. As evidenced by the large number of sessions at professional conferences on fitting compression hearing aids, clinicians continue to have questions about compression technology and when and how it should be used. How does compression work? Who are the best candidates for this technology? How should adjustable parameters be set to provide optimal speech recognition? What effect will compression have on speech quality? These and other questions continue to drive our interest in this technology. This article reviews the effects of compression on the speech signal and the implications for speech intelligibility, quality, and design of clinical procedures. PMID:25425919
An analysis of the masking of speech by competing speech using self-report data.
Agus, Trevor R; Akeroyd, Michael A; Noble, William; Bhullar, Navjot
2009-01-01
Many of the items in the "Speech, Spatial, and Qualities of Hearing" scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.
Does quality of life depend on speech recognition performance for adult cochlear implant users?
Capretta, Natalie R; Moberly, Aaron C
2016-03-01
Current postoperative clinical outcome measures for adults receiving cochlear implants (CIs) consist of testing speech recognition, primarily under quiet conditions. However, it is strongly suspected that results on these measures may not adequately reflect patients' quality of life (QOL) using their implants. This study aimed to evaluate whether QOL for CI users depends on speech recognition performance. Twenty-three postlingually deafened adults with CIs were assessed. Participants were tested for speech recognition (Central Institute for the Deaf word and AzBio sentence recognition in quiet) and completed three QOL measures-the Nijmegen Cochlear Implant Questionnaire; either the Hearing Handicap Inventory for Adults or the Hearing Handicap Inventory for the Elderly; and the Speech, Spatial and Qualities of Hearing Scale questionnaires-to assess a variety of QOL factors. Correlations were sought between speech recognition and QOL scores. Demographics, audiologic history, language, and cognitive skills were also examined as potential predictors of QOL. Only a few QOL scores significantly correlated with postoperative sentence or word recognition in quiet, and correlations were primarily isolated to speech-related subscales on QOL measures. Poorer pre- and postoperative unaided hearing predicted better QOL. Socioeconomic status, duration of deafness, age at implantation, duration of CI use, reading ability, vocabulary size, and cognitive status did not consistently predict QOL scores. For adult, postlingually deafened CI users, clinical speech recognition measures in quiet do not correlate broadly with QOL. Results suggest the need for additional outcome measures of the benefits and limitations of cochlear implantation. 4. Laryngoscope, 126:699-706, 2016. © 2015 The American Laryngological, Rhinological and Otological Society, Inc.
Mantokoudis, Georgios; Dubach, Patrick; Pfiffner, Flurin; Kompis, Martin; Caversaccio, Marco; Senn, Pascal
2012-07-16
Telephone communication is a challenge for many hearing-impaired individuals. One important technical reason for this difficulty is the restricted frequency range (0.3-3.4 kHz) of conventional landline telephones. Internet telephony (voice over Internet protocol [VoIP]) is transmitted with a larger frequency range (0.1-8 kHz) and therefore includes more frequencies relevant to speech perception. According to a recently published, laboratory-based study, the theoretical advantage of ideal VoIP conditions over conventional telephone quality has translated into improved speech perception by hearing-impaired individuals. However, the speech perception benefits of nonideal VoIP network conditions, which may occur in daily life, have not been explored. VoIP use cannot be recommended to hearing-impaired individuals before its potential under more realistic conditions has been examined. To compare realistic VoIP network conditions, under which digital data packets may be lost, with ideal conventional telephone quality with respect to their impact on speech perception by hearing-impaired individuals. We assessed speech perception using standardized test material presented under simulated VoIP conditions with increasing digital data packet loss (from 0% to 20%) and compared with simulated ideal conventional telephone quality. We monaurally tested 10 adult users of cochlear implants, 10 adult users of hearing aids, and 10 normal-hearing adults in the free sound field, both in quiet and with background noise. Across all participant groups, mean speech perception scores using VoIP with 0%, 5%, and 10% packet loss were 15.2% (range 0%-53%), 10.6% (4%-46%), and 8.8% (7%-33%) higher, respectively, than with ideal conventional telephone quality. Speech perception did not differ between VoIP with 20% packet loss and conventional telephone quality. The maximum benefits were observed under ideal VoIP conditions without packet loss and were 36% (P = .001) for cochlear implant users, 18% (P = .002) for hearing aid users, and 53% (P = .001) for normal-hearing adults. With a packet loss of 10%, the maximum benefits were 30% (P = .002) for cochlear implant users, 6% (P = .38) for hearing aid users, and 33% (P = .002) for normal-hearing adults. VoIP offers a speech perception benefit over conventional telephone quality, even when mild or moderate packet loss scenarios are created in the laboratory. VoIP, therefore, has the potential to significantly improve telecommunication abilities for the large community of hearing-impaired individuals.
Dubach, Patrick; Pfiffner, Flurin; Kompis, Martin; Caversaccio, Marco
2012-01-01
Background Telephone communication is a challenge for many hearing-impaired individuals. One important technical reason for this difficulty is the restricted frequency range (0.3–3.4 kHz) of conventional landline telephones. Internet telephony (voice over Internet protocol [VoIP]) is transmitted with a larger frequency range (0.1–8 kHz) and therefore includes more frequencies relevant to speech perception. According to a recently published, laboratory-based study, the theoretical advantage of ideal VoIP conditions over conventional telephone quality has translated into improved speech perception by hearing-impaired individuals. However, the speech perception benefits of nonideal VoIP network conditions, which may occur in daily life, have not been explored. VoIP use cannot be recommended to hearing-impaired individuals before its potential under more realistic conditions has been examined. Objective To compare realistic VoIP network conditions, under which digital data packets may be lost, with ideal conventional telephone quality with respect to their impact on speech perception by hearing-impaired individuals. Methods We assessed speech perception using standardized test material presented under simulated VoIP conditions with increasing digital data packet loss (from 0% to 20%) and compared with simulated ideal conventional telephone quality. We monaurally tested 10 adult users of cochlear implants, 10 adult users of hearing aids, and 10 normal-hearing adults in the free sound field, both in quiet and with background noise. Results Across all participant groups, mean speech perception scores using VoIP with 0%, 5%, and 10% packet loss were 15.2% (range 0%–53%), 10.6% (4%–46%), and 8.8% (7%–33%) higher, respectively, than with ideal conventional telephone quality. Speech perception did not differ between VoIP with 20% packet loss and conventional telephone quality. The maximum benefits were observed under ideal VoIP conditions without packet loss and were 36% (P = .001) for cochlear implant users, 18% (P = .002) for hearing aid users, and 53% (P = .001) for normal-hearing adults. With a packet loss of 10%, the maximum benefits were 30% (P = .002) for cochlear implant users, 6% (P = .38) for hearing aid users, and 33% (P = .002) for normal-hearing adults. Conclusions VoIP offers a speech perception benefit over conventional telephone quality, even when mild or moderate packet loss scenarios are created in the laboratory. VoIP, therefore, has the potential to significantly improve telecommunication abilities for the large community of hearing-impaired individuals. PMID:22805169
Restoring speech perception with cochlear implants by spanning defective electrode contacts.
Frijns, Johan H M; Snel-Bongers, Jorien; Vellinga, Dirk; Schrage, Erik; Vanpoucke, Filiep J; Briaire, Jeroen J
2013-04-01
Even with six defective contacts, spanning can largely restore speech perception with the HiRes 120 speech processing strategy to the level supported by an intact electrode array. Moreover, the sound quality is not degraded. Previous studies have demonstrated reduced speech perception scores (SPS) with defective contacts in HiRes 120. This study investigated whether replacing defective contacts by spanning, i.e. current steering on non-adjacent contacts, is able to restore speech recognition to the level supported by an intact electrode array. Ten adult cochlear implant recipients (HiRes90K, HiFocus1J) with experience with HiRes 120 participated in this study. Three different defective electrode arrays were simulated (six separate defective contacts, three pairs or two triplets). The participants received three take-home strategies and were asked to evaluate the sound quality in five predefined listening conditions. After 3 weeks, SPS were evaluated with monosyllabic words in quiet and in speech-shaped background noise. The participants rated the sound quality equal for all take-home strategies. SPS with background noise were equal for all conditions tested. However, SPS in quiet (85% phonemes correct on average with the full array) decreased significantly with increasing spanning distance, with a 3% decrease for each spanned contact.
Koller, Roger; Guignard, Jérémie; Caversaccio, Marco; Kompis, Martin; Senn, Pascal
2017-01-01
Background Telecommunication is limited or even impossible for more than one-thirds of all cochlear implant (CI) users. Objective We sought therefore to study the impact of voice quality on speech perception with voice over Internet protocol (VoIP) under real and adverse network conditions. Methods Telephone speech perception was assessed in 19 CI users (15-69 years, average 42 years), using the German HSM (Hochmair-Schulz-Moser) sentence test comparing Skype and conventional telephone (public switched telephone networks, PSTN) transmission using a personal computer (PC) and a digital enhanced cordless telecommunications (DECT) telephone dual device. Five different Internet transmission quality modes and four accessories (PC speakers, headphones, 3.5 mm jack audio cable, and induction loop) were compared. As a secondary outcome, the subjective perceived voice quality was assessed using the mean opinion score (MOS). Results Speech telephone perception was significantly better (median 91.6%, P<.001) with Skype compared with PSTN (median 42.5%) under optimal conditions. Skype calls under adverse network conditions (data packet loss > 15%) were not superior to conventional telephony. In addition, there were no significant differences between the tested accessories (P>.05) using a PC. Coupling a Skype DECT phone device with an audio cable to the CI, however, resulted in higher speech perception (median 65%) and subjective MOS scores (3.2) than using PSTN (median 7.5%, P<.001). Conclusions Skype calls significantly improve speech perception for CI users compared with conventional telephony under real network conditions. Listening accessories do not further improve listening experience. Current Skype DECT telephone devices do not fully offer technical advantages in voice quality. PMID:28438727
Mantokoudis, Georgios; Koller, Roger; Guignard, Jérémie; Caversaccio, Marco; Kompis, Martin; Senn, Pascal
2017-04-24
Telecommunication is limited or even impossible for more than one-thirds of all cochlear implant (CI) users. We sought therefore to study the impact of voice quality on speech perception with voice over Internet protocol (VoIP) under real and adverse network conditions. Telephone speech perception was assessed in 19 CI users (15-69 years, average 42 years), using the German HSM (Hochmair-Schulz-Moser) sentence test comparing Skype and conventional telephone (public switched telephone networks, PSTN) transmission using a personal computer (PC) and a digital enhanced cordless telecommunications (DECT) telephone dual device. Five different Internet transmission quality modes and four accessories (PC speakers, headphones, 3.5 mm jack audio cable, and induction loop) were compared. As a secondary outcome, the subjective perceived voice quality was assessed using the mean opinion score (MOS). Speech telephone perception was significantly better (median 91.6%, P<.001) with Skype compared with PSTN (median 42.5%) under optimal conditions. Skype calls under adverse network conditions (data packet loss > 15%) were not superior to conventional telephony. In addition, there were no significant differences between the tested accessories (P>.05) using a PC. Coupling a Skype DECT phone device with an audio cable to the CI, however, resulted in higher speech perception (median 65%) and subjective MOS scores (3.2) than using PSTN (median 7.5%, P<.001). Skype calls significantly improve speech perception for CI users compared with conventional telephony under real network conditions. Listening accessories do not further improve listening experience. Current Skype DECT telephone devices do not fully offer technical advantages in voice quality. ©Georgios Mantokoudis, Roger Koller, Jérémie Guignard, Marco Caversaccio, Martin Kompis, Pascal Senn. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 24.04.2017.
Joos, Kathleen; De Ridder, Dirk; Boey, Ronny A.; Vanneste, Sven
2014-01-01
Introduction: Stuttering is defined as speech characterized by verbal dysfluencies, but should not be seen as an isolated speech disorder, but as a generalized sensorimotor timing deficit due to impaired communication between speech related brain areas. Therefore we focused on resting state brain activity and functional connectivity. Method: We included 11 patients with developmental stuttering and 11 age matched controls. To objectify stuttering severity and the impact on quality of life (QoL), we used the Dutch validated Test for Stuttering Severity-Readers (TSS-R) and the Overall Assessment of the Speaker’s Experience of Stuttering (OASES), respectively. Furthermore, we used standardized low resolution brain electromagnetic tomography (sLORETA) analyses to look at resting state activity and functional connectivity differences and their correlations with the TSS-R and OASES. Results: No significant results could be obtained when looking at neural activity, however significant alterations in resting state functional connectivity could be demonstrated between persons who stutter (PWS) and fluently speaking controls, predominantly interhemispheric, i.e., a decreased functional connectivity for high frequency oscillations (beta and gamma) between motor speech areas (BA44 and 45) and the contralateral premotor (BA6) and motor (BA4) areas. Moreover, a positive correlation was found between functional connectivity at low frequency oscillations (theta and alpha) and stuttering severity, while a mixed increased and decreased functional connectivity at low and high frequency oscillations correlated with QoL. Discussion: PWS are characterized by decreased high frequency interhemispheric functional connectivity between motor speech, premotor and motor areas in the resting state, while higher functional connectivity in the low frequency bands indicates more severe speech disturbances, suggesting that increased interhemispheric and right sided functional connectivity is maladaptive. PMID:25352797
Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar
2016-10-01
Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
Impact of Aberrant Acoustic Properties on the Perception of Sound Quality in Electrolarynx Speech
ERIC Educational Resources Information Center
Meltzner, Geoffrey S.; Hillman, Robert E.
2005-01-01
A large percentage of patients who have undergone laryngectomy to treat advanced laryngeal cancer rely on an electrolarynx (EL) to communicate verbally. Although serviceable, EL speech is plagued by shortcomings in both sound quality and intelligibility. This study sought to better quantify the relative contributions of previously identified…
Automatic intelligibility classification of sentence-level pathological speech
Kim, Jangwon; Kumar, Naveen; Tsiartas, Andreas; Li, Ming; Narayanan, Shrikanth S.
2014-01-01
Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. Although automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist experts in diagnosis and treatment design, the many sources and types of variability often make it a very challenging computational processing problem. In this work we propose novel sentence-level features to capture abnormal variation in the prosodic, voice quality and pronunciation aspects in pathological speech. In addition, we propose a post-classification posterior smoothing scheme which refines the posterior of a test sample based on the posteriors of other test samples. Finally, we perform feature-level fusions and subsystem decision fusion for arriving at a final intelligibility decision. The performances are tested on two pathological speech datasets, the NKI CCRT Speech Corpus (advanced head and neck cancer) and the TORGO database (cerebral palsy or amyotrophic lateral sclerosis), by evaluating classification accuracy without overlapping subjects’ data among training and test partitions. Results show that the feature sets of each of the voice quality subsystem, prosodic subsystem, and pronunciation subsystem, offer significant discriminating power for binary intelligibility classification. We observe that the proposed posterior smoothing in the acoustic space can further reduce classification errors. The smoothed posterior score fusion of subsystems shows the best classification performance (73.5% for unweighted, and 72.8% for weighted, average recalls of the binary classes). PMID:25414544
Radio speech communication problems reported in a survey of military pilots.
Lahtinen, Taija M M; Huttunen, Kerttu H; Kuronen, Pentti O; Sorri, Martti J; Leino, Tuomo K
2010-12-01
Despite technological advances in conveying information, speech communication is still a key safety factor in aviation. Effective radio communication is necessary, for example, in building and maintaining good team situation awareness. However, little has been reported concerning the prevalence and nature of radio communication problems in everyday working environments in military aviation. We surveyed Finnish Defense Forces pilots regarding the prevalence of radio speech communication problems. Of the 225 pilots contacted, 75% replied to our survey. Altogether 138 of the respondents were fixed-wing pilots and 31 were helicopter pilots. Problems in radio communication occurred, on average, during 14% of flight time. The most prevalent problems were multiple speakers on the same radio frequency band causing overlapping speech, missing acknowledgments, high background noise especially during helicopter operations, and technical problems. Of the respondents, 18% (31 pilots) reported having encountered at least one potentially dangerous event caused by problems in radio communication during their military aviation career. If the employer were to offer extra hearing protection, such as custom-made ear plugs, 93% of the pilots indicated that they would use it. Communication can be a flight safety factor especially during intense air combat exercises and other information-loaded flights. During these situations, communication should be clear and focused on the most essential information. So, training and technical improvements are necessary for better communication. High quality radio speech communication also improves operational effectiveness in military aviation.
Speech coding at 4800 bps for mobile satellite communications
NASA Technical Reports Server (NTRS)
Gersho, Allen; Chan, Wai-Yip; Davidson, Grant; Chen, Juin-Hwey; Yong, Mei
1988-01-01
A speech compression project has recently been completed to develop a speech coding algorithm suitable for operation in a mobile satellite environment aimed at providing telephone quality natural speech at 4.8 kbps. The work has resulted in two alternative techniques which achieve reasonably good communications quality at 4.8 kbps while tolerating vehicle noise and rather severe channel impairments. The algorithms are embodied in a compact self-contained prototype consisting of two AT and T 32-bit floating-point DSP32 digital signal processors (DSP). A Motorola 68HC11 microcomputer chip serves as the board controller and interface handler. On a wirewrapped card, the prototype's circuit footprint amounts to only 200 sq cm, and consumes about 9 watts of power.
Ultrasound visual feedback in articulation therapy following partial glossectomy.
Blyth, Katrina M; Mccabe, Patricia; Madill, Catherine; Ballard, Kirrie J
2016-01-01
Disordered speech is common following treatment for tongue cancer, however there is insufficient high quality evidence to guide clinical decision making about treatment. This study investigated use of ultrasound tongue imaging as a visual feedback tool to guide tongue placement during articulation therapy with two participants following partial glossectomy. A Phase I multiple baseline design across behaviors was used to investigate therapeutic effect of ultrasound visual feedback during speech rehabilitation. Percent consonants correct and speech intelligibility at sentence level were used to measure acquisition, generalization and maintenance of speech skills for treated and untreated related phonemes, while unrelated phonemes were tested to demonstrate experimental control. Swallowing and oromotor measures were also taken to monitor change. Sentence intelligibility was not a sensitive measure of speech change, but both participants demonstrated significant change in percent consonants correct for treated phonemes. One participant also demonstrated generalization to non-treated phonemes. Control phonemes along with swallow and oromotor measures remained stable throughout the study. This study establishes therapeutic benefit of ultrasound visual feedback in speech rehabilitation following partial glossectomy. Readers will be able to explain why and how tongue cancer surgery impacts on articulation precision. Readers will also be able to explain the acquisition, generalization and maintenance effects in the study. Copyright © 2016. Published by Elsevier Inc.
Verbal communication impacts quality of life in patients with amyotrophic lateral sclerosis.
Felgoise, Stephanie H; Zaccheo, Vincenzo; Duff, Jason; Simmons, Zachary
2016-01-01
Global quality of life (QoL) in patients with ALS has been found to be independent of overall physical function. However, the relationship between verbal communication ability and QoL has not been explored. This was a retrospective study using data from a study validating the ALS-Specific QoL Questionnaire (ALSSQoL). Speech function was assessed using the first question on the ALS Functional Rating Scale (ALSFRS), ranging from 4 (normal speech) to 0 (loss of useful speech). There were 338 participants for whom data were available for speech function and for all ALSSQoL subscales. Analysis of variance revealed that QoL varied among individuals with different functional abilities for speech (F (4,333) = 5.13, p = 0.001). Specifically, poorer QoL was related to initial impairments in verbal communication ability (p = 0.005). QoL also was poorer in those with no speech ability compared to those with normal speech (p = 0.008). In conclusion, the ability to communicate verbally, unlike overall physical function, is directly related to overall QoL in patients with ALS. The initial period of speech impairment appears to have a particularly strong impact on QoL, and may be an important time for intervention.
Shih, Ludy C; Piel, Jordan; Warren, Amanda; Kraics, Lauren; Silver, Althea; Vanderhorst, Veronique; Simon, David K; Tarsy, Daniel
2012-06-01
Parkinson's disease related speech and voice impairment have significant impact on quality of life measures. LSVT(®)LOUD voice and speech therapy (Lee Silverman Voice Therapy) has demonstrated scientific efficacy and clinical effectiveness, but musically based voice and speech therapy has been underexplored as a potentially useful method of rehabilitation. We undertook a pilot, open-label study of a group-based singing intervention, consisting of twelve 90-min weekly sessions led by a voice and speech therapist/singing instructor. The primary outcome measure of vocal loudness as measured by sound pressure level (SPL) at 50 cm during connected speech was not significantly different one week after the intervention or at 13 weeks after the intervention. A number of secondary measures reflecting pitch range, phonation time and maximum loudness also were unchanged. Voice related quality of life (VRQOL) and voice handicap index (VHI) also were unchanged. This study suggests that a group singing therapy intervention at this intensity and frequency does not result in significant improvement in objective and subject-rated measures of voice and speech impairment. Copyright © 2012 Elsevier Ltd. All rights reserved.
Böhme, G; Clasen, B
1989-09-01
We carried out a transnasal insufflation test according to Blom and Singer on 27 laryngectomy patients as well as a speech communications test with the help of reverse speech audiometry, i.e. the post laryngectomy telephone test according to Zenner and Pfrang. The combined evaluation of both tests provided basic information on the quality of the esophagus voice and functionability of the speech organs. Both tests can be carried out quickly and easily and allow a differentiated statement to be made on the application possibilities of a esophagus voice, electronic speech aids and voice prothesis. Three groups could be identified from our results: 1. Insufflation test and reverse speech test provided conformable good or very good results. The esophagus voice was well understood. 2. Complete failure in the insufflation and telephone tests calls for further examinations to exclude any spasm, stricture, divertical and scarred membrane stenosis as well as tumor relapse in the region of the pharyngo-esophageal segments. 3. Organic causes must be looked for in the area of the nozzle as well as cranial nerve failure and social-determined causes in the case of normal insufflation and considerably reduced speech communication in the telephone test.
Long, Ross E.; Wilson-Genderson, Maureen; Grayson, Barry H.; Flores, Roberto; Broder, Hillary L.
2016-01-01
Objective To report the associations of oro-nasal fistulae on the patient-centered outcomes oral health–related quality of life and self-reported speech outcomes in school aged-children. Design Prospective, nonrandomized multicenter design. Setting Six ACPA-accredited cleft centers. Participants Patients with cleft palate at the age of mixed dentition. Interventions None. Main Outcome Measures Prevalence of fistula and location of fistula (Pittsburgh Classification System). Patients were placed into one of three groups based on the following criteria: alveolar cleft present, no previous repair (Group 1); alveolar cleft present, previously repaired (Group 2); no congenital alveolar cleft (Group 3). Presence of fistula and subgroup classification were correlated to oral health–related quality of life (Child Oral Health Impact Profile [COHIP]) and perceived speech outcomes. Results The fistula rate was 5.52% (62 of 1198 patients). There was a significant difference in fistula rate between the three groups: Group 1 (11.15%), Group 2 (4.44%), Group 3 (1.90%). Patients with fistula had significantly lower COHIP scores (F1,1188 = 4.79; P = .03) and worse self-reported speech scores (F1,1197 = 4.27; P = .04). Group 1 patients with fistula had the lowest COHIP scores (F5,1188 = 4.78, P =.02) and the lowest speech scores (F5,1188 = 3.41, P = .003). Conclusions Presence of palatal fistulas was associated with lower oral health–related quality of life and perceived speech among youth with cleft. The poorest outcomes were reported among those with the highest fistula rates, including an unrepaired alveolar cleft. PMID:26437081
Chalupper, Josef
2017-01-01
The benefits of combining a cochlear implant (CI) and a hearing aid (HA) in opposite ears on speech perception were examined in 15 adult unilateral CI recipients who regularly use a contralateral HA. A within-subjects design was carried out to assess speech intelligibility testing, listening effort ratings, and a sound quality questionnaire for the conditions CI alone, CIHA together, and HA alone when applicable. The primary outcome of bimodal benefit, defined as the difference between CIHA and CI, was statistically significant for speech intelligibility in quiet as well as for intelligibility in noise across tested spatial conditions. A reduction in effort on top of intelligibility at the highest tested signal-to-noise ratio was found. Moreover, the bimodal listening situation was rated to sound more voluminous, less tinny, and less unpleasant than CI alone. Listening effort and sound quality emerged as feasible and relevant measures to demonstrate bimodal benefit across a clinically representative range of bimodal users. These extended dimensions of speech perception can shed more light on the array of benefits provided by complementing a CI with a contralateral HA. PMID:28874096
Auditory-Perceptual and Acoustic Methods in Measuring Dysphonia Severity of Korean Speech.
Maryn, Youri; Kim, Hyung-Tae; Kim, Jaeock
2016-09-01
The purpose of this study was to explore the criterion-related concurrent validity of two standardized auditory-perceptual rating protocols and the Acoustic Voice Quality Index (AVQI) for measuring dysphonia severity in Korean speech. Sixty native Korean subjects with various voice disorders were asked to sustain the vowel [a:] and to read aloud the Korean text "Walk." A 3-second midvowel portion of the sustained vowel and two sentences (with 25 syllables) were edited, concatenated, and analyzed according to methods described elsewhere. From 56 participants, both continuous speech and sustained vowel recordings had sufficiently high signal-to-noise ratios (35.5 dB and 37 dB on average, respectively) and were therefore subjected to further dysphonia severity analysis with (1) "G" or Grade from the GRBAS protocol, (2) "OS" or Overall Severity from the Consensus Auditory-Perceptual Evaluation of Voice protocol, and (3) AVQI. First, high correlations were found between G and OS (rS = 0.955 for sustained vowels; rS = 0.965 for continuous speech). Second, the AVQI showed a strong correlation with G (rS = 0.911) as well as OS (rP = 0.924). These findings are in agreement with similar studies dealing with continuous speech in other languages. The present study highlights the criterion-related concurrent validity of these methods in Korean speech. Furthermore, it supports the cross-linguistic robustness of the AVQI as a valid and objective marker of overall dysphonia severity. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
The Speech, Spatial and Qualities of Hearing Scale (SSQ)
Gatehouse, Stuart; Noble, William
2017-01-01
The Speech, Spatial and Qualities of Hearing Scale (SSQ) is designed to measure a range of hearing disabilities across several domains. Particular attention is given to hearing speech in a variety of competing contexts, and to the directional, distance and movement components of spatial hearing. In addition, the abilities both to segregate sounds and to attend to simultaneous speech streams are assessed, reflecting the reality of hearing in the everyday world. Qualities of hearing experience include ease of listening, and the naturalness, clarity and identifiability of different speakers, different musical pieces and instruments, and different everyday sounds. Application of the SSQ to 153 new clinic clients prior to hearing aid fitting showed that the greatest difficulty was experienced with simultaneous speech streams, ease of listening, listening in groups and in noise, and judging distance and movement. SSQ ratings were compared with an independent measure of handicap. After differences in hearing level were controlled for, it was found that identification, attention and effort problems, as well as spatial hearing problems, feature prominently in the disability–handicap relationship, along with certain features of speech hearing. The results implicate aspects of temporal and spatial dynamics of hearing disability in the experience of handicap. The SSQ shows promise as an instrument for evaluating interventions of various kinds, particularly (but not exclusively) those that implicate binaural function. PMID:15035561
Diagnostic articulation tables
NASA Astrophysics Data System (ADS)
Mikhailov, V. G.
2002-09-01
In recent years, considerable progress has been made in the development of instrumental methods for general speech quality and intelligibility evaluation on the basis of modeling the auditory perception of speech and measuring the signal-to-noise ratio. Despite certain advantages (fast measurement procedures with a low labor consumption), these methods are not universal and, in essence, secondary, because they rely on the calibration based on subjective-statistical measurements. At the same time, some specific problems of speech quality evaluation, such as the diagnostics of the factors responsible for the deviation of the speech quality from standard (e.g., accent features of a speaker or individual voice distortions), can be solved by psycholinguistic methods. This paper considers different kinds of diagnostic articulation tables: tables of minimal pairs of monosyllabic words (DRT) based on the Jacobson differential features, tables consisting of multisyllabic quartets of Russian words (the choice method), and tables of incomplete monosyllables of the _VC/CV_ type (the supplementary note method). Comparative estimates of the tables are presented along with the recommendations concerning their application.
Language learning, socioeconomic status, and child-directed speech.
Schwab, Jessica F; Lew-Williams, Casey
2016-07-01
Young children's language experiences and language outcomes are highly variable. Research in recent decades has focused on understanding the extent to which family socioeconomic status (SES) relates to parents' language input to their children and, subsequently, children's language learning. Here, we first review research demonstrating differences in the quantity and quality of language that children hear across low-, mid-, and high-SES groups, but also-and perhaps more importantly-research showing that differences in input and learning also exist within SES groups. Second, in order to better understand the defining features of 'high-quality' input, we highlight findings from laboratory studies examining specific characteristics of the sounds, words, sentences, and social contexts of child-directed speech (CDS) that influence children's learning. Finally, after narrowing in on these particular features of CDS, we broaden our discussion by considering family and community factors that may constrain parents' ability to participate in high-quality interactions with their young children. A unification of research on SES and CDS will facilitate a more complete understanding of the specific means by which input shapes learning, as well as generate ideas for crafting policies and programs designed to promote children's language outcomes. WIREs Cogn Sci 2016, 7:264-275. doi: 10.1002/wcs.1393 For further resources related to this article, please visit the WIREs website. © 2016 Wiley Periodicals, Inc.
Bhuskute, Aditi; Skirko, Jonathan R; Roth, Christina; Bayoumi, Ahmed; Durbin-Johnson, Blythe; Tollefson, Travis T
2017-09-01
Patients with cleft palate and other causes of velopharyngeal insufficiency (VPI) suffer adverse effects on social interactions and communication. Measurement of these patient-reported outcomes is needed to help guide surgical and nonsurgical care. To further validate the VPI Effects on Life Outcomes (VELO) instrument, measure the change in quality of life (QOL) after speech surgery, and test the association of change in speech with change in QOL. Prospective descriptive cohort including children and young adults undergoing speech surgery for VPI in a tertiary academic center. Participants completed the validated VELO instrument before and after surgical treatment. The main outcome measures were preoperative and postoperative VELO scores and the perceptual speech assessment of speech intelligibility. The VELO scores are divided into subscale domains. Changes in VELO after surgery were analyzed using linear regression models. VELO scores were analyzed as a function of speech intelligibility adjusting for age and cleft type. The correlation between speech intelligibility rating and VELO scores was estimated using the polyserial correlation. Twenty-nine patients (13 males and 16 females) were included. Mean (SD) age was 7.9 (4.1) years (range, 4-20 years). Pharyngeal flap was used in 14 (48%) cases, Furlow palatoplasty in 12 (41%), and sphincter pharyngoplasty in 1 (3%). The mean (SD) preoperative speech intelligibility rating was 1.71 (1.08), which decreased postoperatively to 0.79 (0.93) in 24 patients who completed protocol (P < .01). The VELO scores improved after surgery (P<.001) as did most subscale scores. Caregiver impact did not change after surgery (P = .36). Speech Intelligibility was correlated with preoperative and postoperative total VELO score (P < .01) and to preoperative subscale domains (situational difficulty [VELO-SiD, P = .005] and perception by others [VELO-PO, P = .05]) and postoperative subscale domains (VELO-SiD [P = .03], VELO-PO [P = .003]). Neither the VELO total nor subscale score change after surgery was correlated with change in speech intelligibility. Speech surgery improves VPI-specific quality of life. We confirmed validation in a population of untreated patients with VPI and included pharyngeal flap surgery, which had not previously been included in validation studies. The VELO instrument provides patient-specific outcomes, which allows a broader understanding of the social, emotional, and physical effects of VPI. 2.
NWR (National Weather Service) voice synthesis project, phase 1
NASA Astrophysics Data System (ADS)
Sampson, G. W.
1986-01-01
The purpose of the NOAA Weather Radio (NWR) Voice Synthesis Project is to provide a demonstration of the current voice synthesis technology. Phase 1 of this project is presented, providing a complete automation of an hourly surface aviation observation for broadcast over NWR. In examining the products currently available on the market, the decision was made that synthetic voice technology does not have the high quality speech required for broadcast over the NWR. Therefore the system presented uses the phrase concatenation type of technology for a very high quality, versatile, voice synthesis system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aimthikul, Y.
This thesis reviews the essential aspects of speech synthesis and distinguishes between the two prevailing techniques: compressed digital speech and phonemic synthesis. It then presents the hardware details of the five speech modules evaluated. FORTRAN programs were written to facilitate message creation and retrieval with four of the modules driven by a PDP-11 minicomputer. The fifth module was driven directly by a computer terminal. The compressed digital speech modules (T.I. 990/306, T.S.I. Series 3D and N.S. Digitalker) each contain a limited vocabulary produced by the manufacturers while both the phonemic synthesizers made by Votrax permit an almost unlimited set ofmore » sounds and words. A text-to-phoneme rules program was adapted for the PDP-11 (running under the RSX-11M operating system) to drive the Votrax Speech Pac module. However, the Votrax Type'N Talk unit has its own built-in translator. Comparison of these modules revealed that the compressed digital speech modules were superior in pronouncing words on an individual basis but lacked the inflection capability that permitted the phonemic synthesizers to generate more coherent phrases. These findings were necessarily highly subjective and dependent on the specific words and phrases studied. In addition, the rapid introduction of new modules by manufacturers will necessitate new comparisons. However, the results of this research verified that all of the modules studied do possess reasonable quality of speech that is suitable for man-machine applications. Furthermore, the development tools are now in place to permit the addition of computer speech output in such applications.« less
Children with Speech, Language and Communication Needs: Their Perceptions of Their Quality of Life
ERIC Educational Resources Information Center
Markham, Chris; van Laar, Darren; Gibbard, Deborah; Dean, Taraneh
2009-01-01
Background: This study is part of a programme of research aiming to develop a quantitative measure of quality of life for children with communication needs. It builds on the preliminary findings of Markham and Dean (2006), which described some of the perception's parents and carers of children with speech language and communication needs had…
Vibrato in Singing Voice: The Link between Source-Filter and Sinusoidal Models
NASA Astrophysics Data System (ADS)
Arroabarren, Ixone; Carlosena, Alfonso
2004-12-01
The application of inverse filtering techniques for high-quality singing voice analysis/synthesis is discussed. In the context of source-filter models, inverse filtering provides a noninvasive method to extract the voice source, and thus to study voice quality. Although this approach is widely used in speech synthesis, this is not the case in singing voice. Several studies have proved that inverse filtering techniques fail in the case of singing voice, the reasons being unclear. In order to shed light on this problem, we will consider here an additional feature of singing voice, not present in speech: the vibrato. Vibrato has been traditionally studied by sinusoidal modeling. As an alternative, we will introduce here a novel noninteractive source filter model that incorporates the mechanisms of vibrato generation. This model will also allow the comparison of the results produced by inverse filtering techniques and by sinusoidal modeling, as they apply to singing voice and not to speech. In this way, the limitations of these conventional techniques, described in previous literature, will be explained. Both synthetic signals and singer recordings are used to validate and compare the techniques presented in the paper.
Patel, Ramya S; Mohr, Tiffany; Hartman, Christine; Stach, Carol; Sikora, Andrew G; Zevallos, Jose P; Sandulache, Vlad C
2018-05-01
Veterans have an increased risk of laryngeal cancer, yet their oncologic and functional outcomes remain understudied. We sought to determine the longitudinal impact of tracheoesophageal puncture and voice prosthesis on quality-of-life measures in veterans following total laryngectomy (TL). We performed a cross-sectional analysis of TL patients (n = 68) treated at the Michael E. DeBakey Veterans Affairs Medical Center using the Voice Handicap Index (VHI), MD Anderson Dysphagia Index (MDADI), and University of Washington Quality of Life Index (UW-QOL). Using tracheoesophageal (TE) speech was associated with significantly better VHI, MDADI, and UW-QOL scores compared to other forms of communication. The association between TE speech use on VHI, MDADI, and UQ-QOL persisted even when the analysis was limited to patients with >5-year follow-up and was maintained on multivariate analysis that accounted for a history of radiation and laryngectomy for recurrent laryngeal cancer. Using tracheoesophageal speech after total laryngectomy is associated with durable improvements in quality of life and functional outcomes in veterans. Tracheoesophageal voice restoration should be attempted whenever technically feasible in patients that meet the complex psychosocial and physical requirements to appropriately utilize TE speech.
Online collaboration environments in telemedicine applications of speech therapy.
Pierrakeas, C; Georgopoulos, V; Malandraki, G
2005-01-01
The use of telemedicine in speech and language pathology provides patients in rural and remote areas with access to quality rehabilitation services that are sufficient, accessible, and user-friendly leading to new possibilities in comprehensive and long-term, cost-effective diagnosis and therapy. This paper discusses the use of online collaboration environments for various telemedicine applications of speech therapy which include online group speech therapy scenarios, multidisciplinary clinical consulting team, and online mentoring and continuing education.
ERIC Educational Resources Information Center
Vanormelingen, Liesbeth; Gillis, Steven
2016-01-01
This article investigates the amount of input and the quality of mother-child interactions in mothers who differ in socio-economic status (SES): mid-to-high SES (mhSES) and low SES. The amount of input was measured as the number of utterances per hour, the total duration of speech per hour and the number of turns per hour. The quality of the…
Thomas, Roha M.; Kaipa, Ramesh
2015-01-01
Objective Previous surveys in the United States of America (USA), the United Kingdom (UK), and Canada have indicated that most of the speech-language pathologists (SLPs) tend to use non-speech oral-motor exercises (NSOMEs) on a regular basis to treat speech disorders. At present, there is considerable debate regarding the clinical effectiveness of NSOMEs. The current study aimed to investigate the pattern and extent of usage of NSOMEs among Indian SLPs. Method An online survey intended to elicit information regarding the use of NSOMEs was sent to 505 members of the Indian Speech and Hearing Association. The questionnaire consisted of three sections. The first section solicited demographic information, the second and third sections solicited information from participants who did and did not prefer to use NSOMEs, respectively. Descriptive statistics were employed to analyse the responses that were clinically relevant. Results A total of 127 participants responded to the survey. Ninety-one percent of the participants who responded to the survey indicated that they used NSOMEs. Conclusion The results suggested that the percentage of SLPs preferring to use NSOMEs is similar to the findings of surveys conducted in the USA, the UK, and Canada. The Indian SLPs continue to use NSOMEs based on a multitude of beliefs. It is important for SLPs to incorporate the principles of evidence-based practice while using NSOMEs to provide high quality clinical care. PMID:26304211
Pennig, Sibylle; Quehl, Julia; Wittkowski, Martin
2014-01-01
Acoustic modifications of loudspeaker announcements were investigated in a simulated aircraft cabin to improve passengers' speech intelligibility and quality of communication in this specific setting. Four experiments with 278 participants in total were conducted in an acoustic laboratory using a standardised speech test and subjective rating scales. In experiments 1 and 2 the sound pressure level (SPL) of the announcements was varied (ranging from 70 to 85 dB(A)). Experiments 3 and 4 focused on frequency modification (octave bands) of the announcements. All studies used a background noise with the same SPL (74 dB(A)), but recorded at different seat positions in the aircraft cabin (front, rear). The results quantify speech intelligibility improvements with increasing signal-to-noise ratio and amplification of particular octave bands, especially the 2 kHz and the 4 kHz band. Thus, loudspeaker power in an aircraft cabin can be reduced by using appropriate filter settings in the loudspeaker system.
Diaferia, Giovana; Badke, Luciana; Santos-Silva, Rogerio; Bommarito, Silvana; Tufik, Sergio; Bittencourt, Lia
2013-07-01
Patients with obstructive sleep apnea (OSA) exhibit reduced quality of life (QoL) due to their daytime symptoms that restricted their social activities. The available data for QoL after treatment with continuous positive airway pressure (CPAP) are inconclusive, and few studies have assessed QoL after treatment with speech therapy or other methods that increase the tonus of the upper airway muscles or with a combination of these therapies. The aim of our study was to assess the effect of speech therapy alone or combined with CPAP on QoL in patients with OSA using three different questionnaires. Men with OSA were randomly allocated to four treatment groups: placebo, 24 patients had sham speech therapy; speech therapy, 27 patients had speech therapy; CPAP, 27 patients had treatment with CPAP; and combination, 22 patients had treatment with CPAP and speech therapy. All patients were treated for 3 months. Participants were assessed before and after treatment and after 3 weeks of a washout period using QoL questionnaires (Functional Outcomes of Sleep Questionnaire [FOSQ], World Health Organization Quality of Life [WHOQoL-Bref], and Medical Outcomes Study 36-Item Short-Form Health Survey [SF-36]). Additional testing measures included an excessive sleepiness scale (Epworth sleepiness scale [ESS]), polysomnography (PSG), and speech therapy assessment. A total of 100 men aged 48.1±11.2 (mean±standard deviation) years had a body mass index (BMI) of 27.4±4.9 kg/m(2), an ESS score of 12.7±3.0, and apnea-hypopnea index (AHI) of 30.9±20.6. After treatment, speech therapy and combination groups showed improvement in the physical domain score of the WHOQoL-Bref and in the functional capacity domain score of the SF-36. Our results suggest that speech therapy alone as well as in association with CPAP might be an alternative treatment for the improvement of QoL in patients with OSA. Copyright © 2013 Elsevier B.V. All rights reserved.
van der Molen, Lisette; van Rossum, Maya A; Jacobi, Irene; van Son, Rob J J H; Smeele, Ludi E; Rasch, Coen R N; Hilgers, Frans J M
2012-09-01
Perceptual judgments and patients' perception of voice and speech after concurrent chemoradiotherapy (CCRT) for advanced head and neck cancer. Prospective clinical trial. A standard Dutch text and a diadochokinetic task were recorded. Expert listeners rated voice and speech quality (based on Grade, Roughness, Breathiness, Asthenia, and Strain), articulation (overall, [p], [t], [k]), and comparative mean opinion scores of voice and speech at three assessment points calculated. A structured study-specific questionnaire evaluated patients' perception pretreatment (N=55), at 10-week (N=49) and 1-year posttreatment (N=37). At 10 weeks, perceptual voice quality is significantly affected. The parameters overall voice quality (mean, -0.24; P=0.008), strain (mean, -0.12; P=0.012), nasality (mean, -0.08; P=0.009), roughness (mean, -0.22; P=0.001), and pitch (mean, -0.03; P=0.041) improved over time but not beyond baseline levels, except for asthenia at 1-year posttreatment (voice is less asthenic than at baseline; mean, +0.20; P=0.03). Perceptual analyses of articulation showed no significant differences. Patients judge their voice quality as good (score, 18/20) at all assessment points, but at 1-year posttreatment, most of them (70%) judge their "voice not as it used to be." In the 1-year versus 10-week posttreatment comparison, the larynx-hypopharynx tumor group was more strained, whereas nonlarynx tumor voices were judged less strained (mean, -0.33 and +0.07, respectively; P=0.031). Patients' perceived changes in voice and speech quality at 10-week post- versus pretreatment correlate weakly with expert judgments. Overall, perceptual CCRT effects on voice and speech seem to peak at 10-week posttreatment but level off at 1-year posttreatment. However, at that assessment point, most patients still perceive their voice as different from baseline. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
A comparative intelligibility study of single-microphone noise reduction algorithms.
Hu, Yi; Loizou, Philipos C
2007-09-01
The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.
Deep neural network and noise classification-based speech enhancement
NASA Astrophysics Data System (ADS)
Shi, Wenhua; Zhang, Xiongwei; Zou, Xia; Han, Wei
2017-07-01
In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.
Human neuromagnetic steady-state responses to amplitude-modulated tones, speech, and music.
Lamminmäki, Satu; Parkkonen, Lauri; Hari, Riitta
2014-01-01
Auditory steady-state responses that can be elicited by various periodic sounds inform about subcortical and early cortical auditory processing. Steady-state responses to amplitude-modulated pure tones have been used to scrutinize binaural interaction by frequency-tagging the two ears' inputs at different frequencies. Unlike pure tones, speech and music are physically very complex, as they include many frequency components, pauses, and large temporal variations. To examine the utility of magnetoencephalographic (MEG) steady-state fields (SSFs) in the study of early cortical processing of complex natural sounds, the authors tested the extent to which amplitude-modulated speech and music can elicit reliable SSFs. MEG responses were recorded to 90-s-long binaural tones, speech, and music, amplitude-modulated at 41.1 Hz at four different depths (25, 50, 75, and 100%). The subjects were 11 healthy, normal-hearing adults. MEG signals were averaged in phase with the modulation frequency, and the sources of the resulting SSFs were modeled by current dipoles. After the MEG recording, intelligibility of the speech, musical quality of the music stimuli, naturalness of music and speech stimuli, and the perceived deterioration caused by the modulation were evaluated on visual analog scales. The perceived quality of the stimuli decreased as a function of increasing modulation depth, more strongly for music than speech; yet, all subjects considered the speech intelligible even at the 100% modulation. SSFs were the strongest to tones and the weakest to speech stimuli; the amplitudes increased with increasing modulation depth for all stimuli. SSFs to tones were reliably detectable at all modulation depths (in all subjects in the right hemisphere, in 9 subjects in the left hemisphere) and to music stimuli at 50 to 100% depths, whereas speech usually elicited clear SSFs only at 100% depth.The hemispheric balance of SSFs was toward the right hemisphere for tones and speech, whereas SSFs to music showed no lateralization. In addition, the right lateralization of SSFs to the speech stimuli decreased with decreasing modulation depth. The results showed that SSFs can be reliably measured to amplitude-modulated natural sounds, with slightly different hemispheric lateralization for different carrier sounds. With speech stimuli, modulation at 100% depth is required, whereas for music the 75% or even 50% modulation depths provide a reasonable compromise between the signal-to-noise ratio of SSFs and sound quality or perceptual requirements. SSF recordings thus seem feasible for assessing the early cortical processing of natural sounds.
Emotion Analysis of Telephone Complaints from Customer Based on Affective Computing.
Gong, Shuangping; Dai, Yonghui; Ji, Jun; Wang, Jinzhao; Sun, Hai
2015-01-01
Customer complaint has been the important feedback for modern enterprises to improve their product and service quality as well as the customer's loyalty. As one of the commonly used manners in customer complaint, telephone communication carries rich emotional information of speeches, which provides valuable resources for perceiving the customer's satisfaction and studying the complaint handling skills. This paper studies the characteristics of telephone complaint speeches and proposes an analysis method based on affective computing technology, which can recognize the dynamic changes of customer emotions from the conversations between the service staff and the customer. The recognition process includes speaker recognition, emotional feature parameter extraction, and dynamic emotion recognition. Experimental results show that this method is effective and can reach high recognition rates of happy and angry states. It has been successfully applied to the operation quality and service administration in telecom and Internet service company.
NASA Astrophysics Data System (ADS)
Riera-Palou, Felip; den Brinker, Albertus C.
2007-12-01
This paper introduces a new audio and speech broadband coding technique based on the combination of a pulse excitation coder and a standardized parametric coder, namely, MPEG-4 high-quality parametric coder. After presenting a series of enhancements to regular pulse excitation (RPE) to make it suitable for the modeling of broadband signals, it is shown how pulse and parametric codings complement each other and how they can be merged to yield a layered bit stream scalable coder able to operate at different points in the quality bit rate plane. The performance of the proposed coder is evaluated in a listening test. The major result is that the extra functionality of the bit stream scalability does not come at the price of a reduced performance since the coder is competitive with standardized coders (MP3, AAC, SSC).
Emotion Analysis of Telephone Complaints from Customer Based on Affective Computing
Gong, Shuangping; Ji, Jun; Wang, Jinzhao; Sun, Hai
2015-01-01
Customer complaint has been the important feedback for modern enterprises to improve their product and service quality as well as the customer's loyalty. As one of the commonly used manners in customer complaint, telephone communication carries rich emotional information of speeches, which provides valuable resources for perceiving the customer's satisfaction and studying the complaint handling skills. This paper studies the characteristics of telephone complaint speeches and proposes an analysis method based on affective computing technology, which can recognize the dynamic changes of customer emotions from the conversations between the service staff and the customer. The recognition process includes speaker recognition, emotional feature parameter extraction, and dynamic emotion recognition. Experimental results show that this method is effective and can reach high recognition rates of happy and angry states. It has been successfully applied to the operation quality and service administration in telecom and Internet service company. PMID:26633967
Power Spectral Density Error Analysis of Spectral Subtraction Type of Speech Enhancement Methods
NASA Astrophysics Data System (ADS)
Händel, Peter
2006-12-01
A theoretical framework for analysis of speech enhancement algorithms is introduced for performance assessment of spectral subtraction type of methods. The quality of the enhanced speech is related to physical quantities of the speech and noise (such as stationarity time and spectral flatness), as well as to design variables of the noise suppressor. The derived theoretical results are compared with the outcome of subjective listening tests as well as successful design strategies, performed by independent research groups.
Psychophysics of complex auditory and speech stimuli
NASA Astrophysics Data System (ADS)
Pastore, Richard E.
1993-10-01
A major focus on the primary project is the use of different procedures to provide converging evidence on the nature of perceptual spaces for speech categories. Completed research examined initial voiced consonants, with results providing strong evidence that different stimulus properties may cue a phoneme category in different vowel contexts. Thus, /b/ is cued by a rising second format (F2) with the vowel /a/, requiring both F2 and F3 to be rising with /i/, and is independent of the release burst for these vowels. Furthermore, cues for phonetic contrasts are not necessarily symmetric, and the strong dependence of prior speech research on classification procedures may have led to errors. Thus, the opposite (falling F2 and F3) transitions lead somewhat ambiguous percepts (i.e., not /b/) which may be labeled consistently (as /d/ or /g/), but requires a release burst to achieve high category quality and similarity to category exemplars. Ongoing research is examining cues in other vowel contexts and issuing procedures to evaluate the nature of interaction between cues for categories of both speech and music.
ERIC Educational Resources Information Center
Kenny, Belinda J.; Lincoln, Michelle; Blyth, Katrina; Balandin, Susan
2009-01-01
Background: Speech pathologists are confronted by ethical issues when they need to make decisions about client care, address team conflict, and fulfil the range of duties and responsibilities required of health professionals. However, there has been little research into the specific nature of ethical dilemmas experienced by speech pathologists and…
Language Awareness and Perception of Connected Speech in a Second Language
ERIC Educational Resources Information Center
Kennedy, Sara; Blanchet, Josée
2014-01-01
To be effective second or additional language (L2) listeners, learners should be aware of typical processes in connected L2 speech (e.g. linking). This longitudinal study explored how learners' developing ability to perceive connected L2 speech was related to the quality of their language awareness. Thirty-two learners of L2 French at a university…
Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise
NASA Technical Reports Server (NTRS)
Huang, Yiteng Arden; Chen, Jingdong; Chen, Shaoyan Sharyl
2009-01-01
A new approach has been proposed for increasing astronaut comfort and speech capture. Currently, the special design of a spacesuit forms an extreme acoustic environment making it difficult to capture clear speech without compromising comfort. The proposed Integrated Spacesuit Audio (ISA) system is to incorporate the microphones into the helmet and use software to extract voice signals from background noise.
Mathai, Jijo Pottackal; Appu, Sabarish
2015-01-01
Auditory neuropathy spectrum disorder (ANSD) is a form of sensorineural hearing loss, causing severe deficits in speech perception. The perceptual problems of individuals with ANSD were attributed to their temporal processing impairment rather than to reduced audibility. This rendered their rehabilitation difficult using hearing aids. Although hearing aids can restore audibility, compression circuits in a hearing aid might distort the temporal modulations of speech, causing poor aided performance. Therefore, hearing aid settings that preserve the temporal modulations of speech might be an effective way to improve speech perception in ANSD. The purpose of the study was to investigate the perception of hearing aid-processed speech in individuals with late-onset ANSD. A repeated measures design was used to study the effect of various compression time settings on speech perception and perceived quality. Seventeen individuals with late-onset ANSD within the age range of 20-35 yr participated in the study. The word recognition scores (WRSs) and quality judgment of phonemically balanced words, processed using four different compression settings of a hearing aid (slow, medium, fast, and linear), were evaluated. The modulation spectra of hearing aid-processed stimuli were estimated to probe the effect of amplification on the temporal envelope of speech. Repeated measures analysis of variance and post hoc Bonferroni's pairwise comparisons were used to analyze the word recognition performance and quality judgment. The comparison between unprocessed and all four hearing aid-processed stimuli showed significantly higher perception using the former stimuli. Even though perception of words processed using slow compression time settings of the hearing aids were significantly higher than the fast one, their difference was only 4%. In addition, there were no significant differences in perception between any other hearing aid-processed stimuli. Analysis of the temporal envelope of hearing aid-processed stimuli revealed minimal changes in the temporal envelope across the four hearing aid settings. In terms of quality, the highest number of individuals preferred stimuli processed using slow compression time settings. Individuals who preferred medium ones followed this. However, none of the individuals preferred fast compression time settings. Analysis of quality judgment showed that slow, medium, and linear settings presented significantly higher preference scores than the fast compression setting. Individuals with ANSD showed no marked difference in perception of speech that was processed using the four different hearing aid settings. However, significantly higher preference, in terms of quality, was found for stimuli processed using slow, medium, and linear settings over the fast one. Therefore, whenever hearing aids are recommended for ANSD, those having slow compression time settings or linear amplification may be chosen over the fast (syllabic compression) one. In addition, WRSs obtained using hearing aid-processed stimuli were remarkably poorer than unprocessed stimuli. This shows that processing of speech through hearing aids might have caused a large reduction of performance in individuals with ANSD. However, further evaluation is needed using individually programmed hearing aids rather than hearing aid-processed stimuli. American Academy of Audiology.
The Effect of Background Traffic Packet Size to VoIP Speech Quality
NASA Astrophysics Data System (ADS)
Triyason, Tuul; Kanthamanon, Prasert; Warasup, Kittipong; Yamsaengsung, Siam; Supattatham, Montri
VoIP is gaining acceptance into the corporate world especially, in small and medium sized business that want to save cost for gaining advantage over their competitors. The good voice quality is one of challenging task in deployment plan because VoIP voice quality was affected by packet loss and jitter delay. In this paper, we study the effect of background traffic packet size to voice quality. The background traffic was generated by Bricks software and the speech quality was assessed by MOS. The obtained result shows an interesting relationship between the voice quality and the number of TCP packets and their size. With the same amount of data smaller packets affect the voice's quality more than the larger packet.
Elefant, Cochavit; Baker, Felicity A; Lotan, Meir; Lagesen, Simen Krogstie; Skeie, Geir Olve
2012-01-01
Parkinson's disease (PD) is a progressive neurodegenerative disorder where patients exhibit impairments in speech production. Few studies have investigated the influence of music interventions on vocal abilities of individuals with PD. To evaluate the influence of a group voice and singing intervention on speech, singing, and depressive symptoms in individuals with PD. Ten patients diagnosed with PD participated in this one-group, repeated measures design study. Participants received the sixty-minute intervention, in a small group setting once a week for 20 consecutive weeks. Speech and singing quality were acoustically analyzed using a KayPentax Multi-Dimensional Voice Program, voice ability using the Voice Handicap Index (VHI), and depressive symptoms using the Montgomery and Asberg Depression rating scale (MADRS). Measures were taken at baseline (Time 1), after 10 weeks of weekly sessions (Time 2), and after 20 weeks of weekly sessions (Time 3). Significant changes were observed for five of the six singing quality outcomes at Time 2 and 3, as well as voice range and the VHI physical subscale at Time 3. No significant changes were found for speaking quality or depressive symptom outcomes; however, there was an absence of decline on speaking quality outcomes over the intervention period. Significant improvements in singing quality and voice range, coupled with the absence of decline in speaking quality support group singing as a promising intervention for persons with PD. A two-group randomized control study is needed to determine whether the intervention contributes to maintenance of speaking quality in persons with PD.
NASA Astrophysics Data System (ADS)
Athaudage, Chandranath R. N.; Bradley, Alan B.; Lech, Margaret
2003-12-01
A dynamic programming-based optimization strategy for a temporal decomposition (TD) model of speech and its application to low-rate speech coding in storage and broadcasting is presented. In previous work with the spectral stability-based event localizing (SBEL) TD algorithm, the event localization was performed based on a spectral stability criterion. Although this approach gave reasonably good results, there was no assurance on the optimality of the event locations. In the present work, we have optimized the event localizing task using a dynamic programming-based optimization strategy. Simulation results show that an improved TD model accuracy can be achieved. A methodology of incorporating the optimized TD algorithm within the standard MELP speech coder for the efficient compression of speech spectral information is also presented. The performance evaluation results revealed that the proposed speech coding scheme achieves 50%-60% compression of speech spectral information with negligible degradation in the decoded speech quality.
Dysphagia Therapy in Stroke: A Survey of Speech and Language Ttherapists
ERIC Educational Resources Information Center
Archer, S. K.; Wellwood, I.; Smith, C. H.; Newham, D. J.
2013-01-01
Background: Dysphagia is common after stroke, leading to adverse outcome. There is a paucity of high-quality evidence for dysphagia therapy, thus making it difficult to determine the best approaches to treatment. Clinical decisions are often based on usual practice, however no formal method of monitoring practice patterns exists. Aims: To…
Prediction, Performance, and Promise: Perspective on Time-Shortened Degree Programs.
ERIC Educational Resources Information Center
Smart, John M., Ed.; Howard, Toni A., Ed.
Among the papers and presentations are: the keynote speech (E. Alden Dunham); the quality baccalaureate myth (Richard Giardina); the high school/college interface and time-shortening (panel presentation); restructuring the baccalaureate: a follow-up study (Robert Bersi); a point of view (Richard Meisler); more options: less time? (DeVere E.…
TeleCITE: Telehealth--A Cochlear Implant Therapy Exchange
ERIC Educational Resources Information Center
Stith, Joanna; Stredler-Brown, Arlene; Greenway, Pat; Kahn, Gary
2012-01-01
What might bring the efforts of a physician, a speech-language pathologist, a teacher of the deaf and hard of hearing, and a nurse together? The answer is the innovative use of telepractice to deliver high quality, family-centered early intervention to infants and toddlers with hearing loss. TeleCITE: Telehealth--A Cochlear Implant Therapy…
Picou, Erin M; Marcrum, Steven C; Ricketts, Todd A
2015-03-01
While potentially improving audibility for listeners with considerable high frequency hearing loss, the effects of implementing nonlinear frequency compression (NFC) for listeners with moderate high frequency hearing loss are unclear. The purpose of this study was to investigate the effects of activating NFC for listeners who are not traditionally considered candidates for this technology. Participants wore study hearing aids with NFC activated for a 3-4 week trial period. After the trial period, they were tested with NFC and with conventional processing on measures of consonant discrimination threshold in quiet, consonant recognition in quiet, sentence recognition in noise, and acceptableness of sound quality of speech and music. Seventeen adult listeners with symmetrical, mild to moderate sensorineural hearing loss participated. Better ear, high frequency pure-tone averages (4, 6, and 8 kHz) were 60 dB HL or better. Activating NFC resulted in lower (better) thresholds for discrimination of /s/, whose spectral center was 9 kHz. There were no other significant effects of NFC compared to conventional processing. These data suggest that the benefits, and detriments, of activating NFC may be limited for this population.
Single-trial analysis of the neural correlates of speech quality perception.
Porbadnigk, Anne K; Treder, Matthias S; Blankertz, Benjamin; Antons, Jan-Niklas; Schleicher, Robert; Möller, Sebastian; Curio, Gabriel; Müller, Klaus-Robert
2013-10-01
Assessing speech quality perception is a challenge typically addressed in behavioral and opinion-seeking experiments. Only recently, neuroimaging methods were introduced, which were used to study the neural processing of quality at group level. However, our electroencephalography (EEG) studies show that the neural correlates of quality perception are highly individual. Therefore, it became necessary to establish dedicated machine learning methods for decoding subject-specific effects. The effectiveness of our methods is shown by the data of an EEG study that investigates how the quality of spoken vowels is processed neurally. Participants were asked to indicate whether they had perceived a degradation of quality (signal-correlated noise) in vowels, presented in an oddball paradigm. We find that the P3 amplitude is attenuated with increasing noise. Single-trial analysis allows one to show that this is partly due to an increasing jitter of the P3 component. A novel classification approach helps to detect trials with presumably non-conscious processing at the threshold of perception. We show that this approach uncovers a non-trivial confounder between neural hits and neural misses. The combined use of EEG signals and machine learning methods results in a significant 'neural' gain in sensitivity (in processing quality loss) when compared to standard behavioral evaluation; averaged over 11 subjects, this amounts to a relative improvement in sensitivity of 35%.
Communication acoustics in Bell Labs
NASA Astrophysics Data System (ADS)
Flanagan, J. L.
2004-05-01
Communication aoustics has been a central theme in Bell Labs research since its inception. Telecommunication serves human information exchange. And, humans favor spoken language as a principal mode. The atmospheric medium typically provides the link between articulation and hearing. Creation, control and detection of sound, and the human's facility for generation and perception are basic ingredients of telecommunication. Electronics technology of the 1920s ushered in great advances in communication at a distance, a strong economical impetus being to overcome bandwidth limitations of wireline and cable. Early research established criteria for speech transmission with high quality and intelligibility. These insights supported exploration of means for efficient transmission-obtaining the greatest amount of speech information over a given bandwidth. Transoceanic communication was initiated by undersea cables for telegraphy. But these long cables exhibited very limited bandwidth (order of few hundred Hz). The challenge of sending voice across the oceans spawned perhaps the best known speech compression technique of history-the Vocoder, which parametrized the signal for transmission in about 300 Hz bandwidth, one-tenth that required for the typical waveform channel. Quality and intelligibility were grave issues (and they still are). At the same time parametric representation offered possibilities for encryption and privacy inside a traditional voice bandwidth. Confidential conversations between Roosevelt and Churchill during World War II were carried over high-frequency radio by an encrypted vocoder system known as Sigsaly. Major engineering advances in the late 1940s and early 1950s moved telecommunications into a new regime-digital technology. These key advances were at least three: (i) new understanding of time-discrete (sampled) representation of signals, (ii) digital computation (especially binary based), and (iii) evolving capabilities in microelectronics that ultimately provided circuits of enormous complexity with low cost and power. Digital transmission (as exemplified in pulse code modulation-PCM, and its many derivatives) became a telecommunication mainstay, along with switches to control and route information in digital form. Concomitantly, storage means for digital information advanced, providing another impetus for speech compression. More and more, humans saw the need to exchange speech information with machines, as well as with other humans. Human-machine speech communication came to full stride in the early 1990s, and now has expanded to multimodal domains that begin to support enhanced naturalness, using contemporaneous sight, sound and touch signaling. Packet transmission is supplanting circuit switching, and voice and video are commonly being carried by Internet protocol.
TEACHER'S GUIDE TO HIGH SCHOOL SPEECH.
ERIC Educational Resources Information Center
JENKINSON, EDWARD B., ED.
THIS GUIDE TO HIGH SCHOOL SPEECH FOCUSES ON SPEECH AS ORAL COMPOSITION, STRESSING THE IMPORTANCE OF CLEAR THINKING AND COMMUNICATION. THE PROPOSED 1-SEMESTER BASIC COURSE IN SPEECH ATTEMPTS TO IMPROVE THE STUDENT'S ABILITY TO COMPOSE AND DELIVER SPEECHES, TO THINK AND LISTEN CRITICALLY, AND TO UNDERSTAND THE SOCIAL FUNCTION OF SPEECH. IN ADDITION…
Tan, Eric J; Thomas, Neil; Rossell, Susan L
2014-04-01
Speech disturbances in schizophrenia impact on the individual's communicative ability. Although they are considered a core feature of schizophrenia, comparatively little work has been done to examine their impact on the life experiences of patients. This study aimed to examine the relationship between schizophrenia speech disturbances, including those traditionally known as formal thought disorder (TD), and quality of life (QoL). It assessed effects on functioning (objective QoL) and satisfaction (subjective QoL) concurrently, while controlling for the influence of neurocognition and depression. Fifty-four patients with schizophrenia/schizoaffective disorder were administered the MATRICS Consensus Cognitive Battery (MCCB), the PANSS, MADRS (with separate ratings for negative TD [verbal underproductivity] and positive TD [verbal disorganisation and pressured speech]) and Lehman's QOLI assessing both objective and subjective QoL. Ratings of positive and negative TD, depression, and general neurocognition were entered into hierarchical regressions to explore their relationship with both life functioning and satisfaction. Verbal underproductivity was a significant predictor of objective QoL, while pressured speech had a trend association with subjective QoL. This suggests a differential relationship between speech disturbances and QoL. Verbal underproductivity seems to affect daily functioning and relations with others, while pressured speech is predictive of satisfaction with life. The impact of verbal underproductivity on QoL suggests it to be an important target for rehabilitation in schizophrenia. Copyright © 2014 Elsevier Inc. All rights reserved.
EEG oscillations entrain their phase to high-level features of speech sound.
Zoefel, Benedikt; VanRullen, Rufin
2016-01-01
Phase entrainment of neural oscillations, the brain's adjustment to rhythmic stimulation, is a central component in recent theories of speech comprehension: the alignment between brain oscillations and speech sound improves speech intelligibility. However, phase entrainment to everyday speech sound could also be explained by oscillations passively following the low-level periodicities (e.g., in sound amplitude and spectral content) of auditory stimulation-and not by an adjustment to the speech rhythm per se. Recently, using novel speech/noise mixture stimuli, we have shown that behavioral performance can entrain to speech sound even when high-level features (including phonetic information) are not accompanied by fluctuations in sound amplitude and spectral content. In the present study, we report that neural phase entrainment might underlie our behavioral findings. We observed phase-locking between electroencephalogram (EEG) and speech sound in response not only to original (unprocessed) speech but also to our constructed "high-level" speech/noise mixture stimuli. Phase entrainment to original speech and speech/noise sound did not differ in the degree of entrainment, but rather in the actual phase difference between EEG signal and sound. Phase entrainment was not abolished when speech/noise stimuli were presented in reverse (which disrupts semantic processing), indicating that acoustic (rather than linguistic) high-level features play a major role in the observed neural entrainment. Our results provide further evidence for phase entrainment as a potential mechanism underlying speech processing and segmentation, and for the involvement of high-level processes in the adjustment to the rhythm of speech. Copyright © 2015 Elsevier Inc. All rights reserved.
Near-toll quality digital speech transmission in the mobile satellite service
NASA Technical Reports Server (NTRS)
Townes, S. A.; Divsalar, D.
1986-01-01
This paper discusses system considerations for near-toll quality digital speech transmission in a 5 kHz mobile satellite system channel. Tradeoffs are shown for power performance versus delay for a 4800 bps speech compression system in conjunction with a 16 state rate 2/3 trellis coded 8PSK modulation system. The suggested system has an additional 150 ms of delay beyond the propagation delay and requires an E(b)/N(0) of about 7 dB for a Ricean channel assumption with line-of-sight to diffuse component ratio of 10 assuming ideal synchronization. An additional loss of 2 to 3 dB is expected for synchronization in fading environment.
Lexical and phonological variability in preschool children with speech sound disorder.
Macrae, Toby; Tyler, Ann A; Lewis, Kerry E
2014-02-01
The authors of this study examined relationships between measures of word and speech error variability and between these and other speech and language measures in preschool children with speech sound disorder (SSD). In this correlational study, 18 preschool children with SSD, age-appropriate receptive vocabulary, and normal oral motor functioning and hearing were assessed across 2 sessions. Experimental measures included word and speech error variability, receptive vocabulary, nonword repetition (NWR), and expressive language. Pearson product–moment correlation coefficients were calculated among the experimental measures. The correlation between word and speech error variability was slight and nonsignificant. The correlation between word variability and receptive vocabulary was moderate and negative, although nonsignificant. High word variability was associated with small receptive vocabularies. The correlations between speech error variability and NWR and between speech error variability and the mean length of children's utterances were moderate and negative, although both were nonsignificant. High speech error variability was associated with poor NWR and language scores. High word variability may reflect unstable lexical representations, whereas high speech error variability may reflect indistinct phonological representations. Preschool children with SSD who show abnormally high levels of different types of speech variability may require slightly different approaches to intervention.
McCormack, Jane; Baker, Elise; Masso, Sarah; Crowe, Kathryn; McLeod, Sharynne; Wren, Yvonne; Roulstone, Sue
2017-06-01
Implementation fidelity refers to the degree to which an intervention or programme adheres to its original design. This paper examines implementation fidelity in the Sound Start Study, a clustered randomised controlled trial of computer-assisted support for children with speech sound disorders (SSD). Sixty-three children with SSD in 19 early childhood centres received computer-assisted support (Phoneme Factory Sound Sorter [PFSS] - Australian version). Educators facilitated the delivery of PFSS targeting phonological error patterns identified by a speech-language pathologist. Implementation data were gathered via (1) the computer software, which recorded when and how much intervention was completed over 9 weeks; (2) educators' records of practice sessions; and (3) scoring of fidelity (intervention procedure, competence and quality of delivery) from videos of intervention sessions. Less than one-third of children received the prescribed number of days of intervention, while approximately one-half participated in the prescribed number of intervention plays. Computer data differed from educators' data for total number of days and plays in which children participated; the degree of match was lower as data became more specific. Fidelity to intervention procedures, competency and quality of delivery was high. Implementation fidelity may impact intervention outcomes and so needs to be measured in intervention research; however, the way in which it is measured may impact on data.
NASA Astrophysics Data System (ADS)
Dutta, Rashmi
INTRODUCTION : Speech science is, in fact, a sub-discipline of the Nonlinear Dynamical System [2,104 ]. There are two different types of Dynamical System. A Continuous Dynamical System may be defined for the continuous time case, by the equation: x = F (x), where x is a vector of length d, defining a point in a d- dimensional space, F is some function (linear or nonlinear) operating on x, and x is the time derivative of x. This system is deterministic, in that it is possible to completely specify its evolution or flow of trajectories in the d- dimensional space, given the initial starting conditions. A Discrete Dynamical System can be defined as a map [by the process of literations]: Xn+1 = G ( Xn ), where Xn is again a d- length vector at time step n, and G is an operator function. Given an initial state, X0, it is possible to calculate the value of xn for any n > 0. Speech has evolved as a primary form of communication between humans, i.e. speech and hearing are the man's most used means of communication [104, 114]. Analysis of human speech has been a goal of Research during the last few decades [105, 108]. With the rapid development of information technology (IT), the human-machine communication, using natural speech, has received wide attention from both academic and business communities. One highly quantitative approach of characterizing the communications potential of speech is in terms of information theory ideas as introduced by Shannon [C.E. Shannon, "A Mathematical Theory of Communication," Bell System Tech journal, Vol 27, pp623- 656, October, 1968]. According to information theory, speech can be represented in terms of its message content, or information. An alternative way of characterizing speech is in terms of the signal carrying the message information, i.e., the acoustic waveform. Although information theoretic ideas have played a major role in sophisticated communications systems, it is the speech representation based on the waveform, or some parametric model, which has been most useful in practical applications. Developing a system that can understand natural language has been a continuing goal of speech researchers. Fully automatic high quality machine translation systems are extremely difficult to build. The difficulty arises from the following reasons: In any natural language text, only part of the information to be conveyed is explicitly expressed. It is the human mind which fills up and supplements the details using contextual.
McAllister, Anita; Brandt, Signe Kofoed
2012-09-01
A well-controlled recording in a studio is fundamental in most voice rehabilitation. However, this laboratory like recording method has been questioned because voice use in a natural environment may be quite different. In children's natural environment, high background noise levels are common and are an important factor contributing to voice problems. The primary noise source in day-care centers is the children themselves. The aim of the present study was to compare perceptual evaluations of voice quality and acoustic measures from a controlled recording with recordings of spontaneous speech in children's natural environment in a day-care setting. Eleven 5-year-old children were recorded three times during a day at the day care. The controlled speech material consisted of repeated sentences. Matching sentences were selected from the spontaneous speech. All sentences were repeated three times. Recordings were randomized and analyzed acoustically and perceptually. Statistic analyses showed that fundamental frequency was significantly higher in spontaneous speech (P<0.01) as was hyperfunction (P<0.001). The only characteristic the controlled sentences shared with spontaneous speech was degree of hoarseness (Spearman's rho=0.564). When data for boys and girls were analyzed separately, a correlation was found for the parameter breathiness (rho=0.551) for boys, and for girls the correlation for hoarseness remained (rho=0.752). Regarding acoustic data, none of the measures correlated across recording conditions for the whole group. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Practice of laryngectomy rehabilitation interventions: a perspective from Hong Kong.
Chan, Jimmy Y W
2013-06-01
To review the current practice of rehabilitation for laryngectomees in Hong Kong. Factors affecting the quality of life of laryngectomees include their performance in speech restoration, the presence of complications of treatment, as well as the availability of psycho-social support. In Hong Kong, more than 90% of laryngectomees have speech restoration by various means, the commonest of which being tracheo-oesophageal puncture and electrolaryngeal speech. However, they face special problems in communication using the current alaryngeal speech modalities, as it is difficult to produce variation in tones, which is important to express different meanings in Cantonese. The responsibility of surgeons to follow-up patients after surgery and the practice of management of common complications after laryngectomy are also discussed. The New Voice Club of Hong Kong promotes self-help and mutual help between laryngectomees, with the aim of helping new members to regain normal speech and to re-integrate into society. Quality-of-life study in Hong Kong shows that although the mean global health score is satisfactory, the social functioning domain is most severely affected after surgery. Cantonese-speaking laryngectomees in Hong Kong are facing unique challenges in speech restoration and re-integration into society after surgery. Surgeons should take the leading role in the multidisciplinary management of these patients.
Smith, Sandra Nelson; Lucas, Laura
2016-01-01
Objectives: A systematic review of the literature and meta-analysis was conducted to assess the nature and quality of the evidence for the use of hearing instruments in adults with a unilateral severe to profound sensorineural hearing loss. Design: The PubMed, EMBASE, MEDLINE, Cochrane, CINAHL, and DARE databases were searched with no restrictions on language. The search included articles from the start of each database until February 11, 2015. Studies were included that (a) assessed the impact of any form of hearing instrument, including devices that reroute signals between the ears or restore aspects of hearing to a deaf ear, in adults with a sensorineural severe to profound loss in one ear and normal or near-normal hearing in the other ear; (b) compared different devices or compared a device with placebo or the unaided condition; (c) measured outcomes in terms of speech perception, spatial listening, or quality of life; (d) were prospective controlled or observational studies. Studies that met prospectively defined criteria were subjected to random effects meta-analyses. Results: Twenty-seven studies reported in 30 articles were included. The evidence was graded as low-to-moderate quality having been obtained primarily from observational before-after comparisons. The meta-analysis identified statistically significant benefits to speech perception in noise for devices that rerouted the speech signals of interest from the worse ear to the better ear using either air or bone conduction (mean benefit, 2.5 dB). However, these devices also degraded speech understanding significantly and to a similar extent (mean deficit, 3.1 dB) when noise was rerouted to the better ear. Data on the effects of cochlear implantation on speech perception could not be pooled as the prospectively defined criteria for meta-analysis were not met. Inconsistency in the assessment of outcomes relating to sound localization also precluded the synthesis of evidence across studies. Evidence for the relative efficacy of different devices was sparse but a statistically significant advantage was observed for rerouting speech signals using abutment-mounted bone conduction devices when compared with outcomes after preoperative trials of air conduction devices when speech and noise were colocated (mean benefit, 1.5 dB). Patients reported significant improvements in hearing-related quality of life with both rerouting devices and following cochlear implantation. Only two studies measured health-related quality of life and findings were inconclusive. Conclusions: Devices that reroute sounds from an ear with a severe to profound hearing loss to an ear with minimal hearing loss may improve speech perception in noise when signals of interest are located toward the impaired ear. However, the same device may also degrade speech perception as all signals are rerouted indiscriminately, including noise. Although the restoration of functional hearing in both ears through cochlear implantation could be expected to provide benefits to speech perception, the inability to synthesize evidence across existing studies means that such a conclusion cannot yet be made. For the same reason, it remains unclear whether cochlear implantation can improve the ability to localize sounds despite restoring bilateral input. Prospective controlled studies that measure outcomes consistently and control for selection and observation biases are required to improve the quality of the evidence for the provision of hearing instruments to patients with unilateral deafness and to support any future recommendations for the clinical management of these patients. PMID:27232073
Steganalysis of recorded speech
NASA Astrophysics Data System (ADS)
Johnson, Micah K.; Lyu, Siwei; Farid, Hany
2005-03-01
Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.
NASA Astrophysics Data System (ADS)
Brand, Thomas
Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.
ERIC Educational Resources Information Center
HAYES, ALFRED S.
THE MANY POSSIBLE VARIATIONS OF LANGUAGE LABORATORY SYSTEMS WERE DESCRIBED, AND RELATIVE ADVANTAGES AND LIMITATIONS OF EACH WERE DISCUSSED. DETAILED GUIDANCE ON PURCHASING LANGUAGE LABORATORY EQUIPMENT WAS PROVIDED THROUGH (1) DEFINITION OF HIGH-QUALITY SPEECH REPRODUCTION, (2) DISCUSSION OF TECHNICAL FACTORS WHICH AFFECT ITS ACHIEVEMENT, AND (3)…
A systematic review of treatment intensity in speech disorders.
Kaipa, Ramesh; Peterson, Abigail Marie
2016-12-01
Treatment intensity (sometimes referred to as "practice amount") has been well-investigated in learning non-speech tasks, but its role in treating speech disorders has not been largely analysed. This study reviewed the literature regarding treatment intensity in speech disorders. A systematic search was conducted in four databases using appropriate search terms. Seven articles from a total of 580 met the inclusion criteria. The speech disorders investigated included speech sound disorders, dysarthria, acquired apraxia of speech and childhood apraxia of speech. All seven studies were evaluated for their methodological quality, research phase and evidence level. Evidence level of reviewed studies ranged from moderate to strong. With regard to the research phase, only one study was considered to be phase III research, which corresponds to the controlled trial phase. The remaining studies were considered to be phase II research, which corresponds to the phase where magnitude of therapeutic effect is assessed. Results suggested that higher treatment intensity was favourable over lower treatment intensity of specific treatment technique(s) for treating childhood apraxia of speech and speech sound (phonological) disorders. Future research should incorporate randomised-controlled designs to establish optimal treatment intensity that is specific to each of the speech disorders.
Engaged listeners: shared neural processing of powerful political speeches
Häcker, Frank E. K.; Honey, Christopher J.; Hasson, Uri
2015-01-01
Powerful speeches can captivate audiences, whereas weaker speeches fail to engage their listeners. What is happening in the brains of a captivated audience? Here, we assess audience-wide functional brain dynamics during listening to speeches of varying rhetorical quality. The speeches were given by German politicians and evaluated as rhetorically powerful or weak. Listening to each of the speeches induced similar neural response time courses, as measured by inter-subject correlation analysis, in widespread brain regions involved in spoken language processing. Crucially, alignment of the time course across listeners was stronger for rhetorically powerful speeches, especially for bilateral regions of the superior temporal gyri and medial prefrontal cortex. Thus, during powerful speeches, listeners as a group are more coupled to each other, suggesting that powerful speeches are more potent in taking control of the listeners’ brain responses. Weaker speeches were processed more heterogeneously, although they still prompted substantially correlated responses. These patterns of coupled neural responses bear resemblance to metaphors of resonance, which are often invoked in discussions of speech impact, and contribute to the literature on auditory attention under natural circumstances. Overall, this approach opens up possibilities for research on the neural mechanisms mediating the reception of entertaining or persuasive messages. PMID:25653012
Some articulatory details of emotional speech
NASA Astrophysics Data System (ADS)
Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth
2005-09-01
Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.
Speech Patterns and Racial Wage Inequality
ERIC Educational Resources Information Center
Grogger, Jeffrey
2011-01-01
Speech patterns differ substantially between whites and many African Americans. I collect and analyze speech data to understand the role that speech may play in explaining racial wage differences. Among blacks, speech patterns are highly correlated with measures of skill such as schooling and AFQT scores. They are also highly correlated with the…
Nixon, C; Anderson, T; Morris, L; McCavitt, A; McKinley, R; Yeager, D; McDaniel, M
1998-11-01
The intelligibility of female and male speech is equivalent under most ordinary living conditions. However, due to small differences between their acoustic speech signals, called speech spectra, one can be more or less intelligible than the other in certain situations such as high levels of noise. Anecdotal information, supported by some empirical observations, suggests that some of the high intensity noise spectra of military aircraft cockpits may degrade the intelligibility of female speech more than that of male speech. In an applied research study, the intelligibility of female and male speech was measured in several high level aircraft cockpit noise conditions experienced in military aviation. In Part I, (Nixon CW, et al. Aviat Space Environ Med 1998; 69:675-83) female speech intelligibility measured in the spectra and levels of aircraft cockpit noises and with noise-canceling microphones was lower than that of the male speech in all conditions. However, the differences were small and only those at some of the highest noise levels were significant. Although speech intelligibility of both genders was acceptable during normal cruise noises, improvements are required in most of the highest levels of noise created during maximum aircraft operating conditions. These results are discussed in a Part I technical report. This Part II report examines the intelligibility in the same aircraft cockpit noises of vocoded female and male speech and the accuracy with which female and male speech in some of the cockpit noises were understood by automatic speech recognition systems. The intelligibility of vocoded female speech was generally the same as that of vocoded male speech. No significant differences were measured between the recognition accuracy of male and female speech by the automatic speech recognition systems. The intelligibility of female and male speech was equivalent for these conditions.
Higgins, Paul; Searchfield, Grant; Coad, Gavin
2012-06-01
The aim of this study was to determine which level-dependent hearing aid digital signal-processing strategy (DSP) participants preferred when listening to music and/or performing a speech-in-noise task. Two receiver-in-the-ear hearing aids were compared: one using 32-channel adaptive dynamic range optimization (ADRO) and the other wide dynamic range compression (WDRC) incorporating dual fast (4 channel) and slow (15 channel) processing. The manufacturers' first-fit settings based on participants' audiograms were used in both cases. Results were obtained from 18 participants on a quick speech-in-noise (QuickSIN; Killion, Niquette, Gudmundsen, Revit, & Banerjee, 2004) task and for 3 music listening conditions (classical, jazz, and rock). Participants preferred the quality of music and performed better at the QuickSIN task using the hearing aids with ADRO processing. A potential reason for the better performance of the ADRO hearing aids was less fluctuation in output with change in sound dynamics. ADRO processing has advantages for both music quality and speech recognition in noise over the multichannel WDRC processing that was used in the study. Further evaluations of which DSP aspects contribute to listener preference are required.
Chen, Yung-Yue
2018-05-08
Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H ₂ estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.
Freedom of racist speech: Ego and expressive threats.
White, Mark H; Crandall, Christian S
2017-09-01
Do claims of "free speech" provide cover for prejudice? We investigate whether this defense of racist or hate speech serves as a justification for prejudice. In a series of 8 studies (N = 1,624), we found that explicit racial prejudice is a reliable predictor of the "free speech defense" of racist expression. Participants endorsed free speech values for singing racists songs or posting racist comments on social media; people high in prejudice endorsed free speech more than people low in prejudice (meta-analytic r = .43). This endorsement was not principled-high levels of prejudice did not predict endorsement of free speech values when identical speech was directed at coworkers or the police. Participants low in explicit racial prejudice actively avoided endorsing free speech values in racialized conditions compared to nonracial conditions, but participants high in racial prejudice increased their endorsement of free speech values in racialized conditions. Three experiments failed to find evidence that defense of racist speech by the highly prejudiced was based in self-relevant or self-protective motives. Two experiments found evidence that the free speech argument protected participants' own freedom to express their attitudes; the defense of other's racist speech seems motivated more by threats to autonomy than threats to self-regard. These studies serve as an elaboration of the Justification-Suppression Model (Crandall & Eshleman, 2003) of prejudice expression. The justification of racist speech by endorsing fundamental political values can serve to buffer racial and hate speech from normative disapproval. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Articulatory speech synthesis and speech production modelling
NASA Astrophysics Data System (ADS)
Huang, Jun
This dissertation addresses the problem of speech synthesis and speech production modelling based on the fundamental principles of human speech production. Unlike the conventional source-filter model, which assumes the independence of the excitation and the acoustic filter, we treat the entire vocal apparatus as one system consisting of a fluid dynamic aspect and a mechanical part. We model the vocal tract by a three-dimensional moving geometry. We also model the sound propagation inside the vocal apparatus as a three-dimensional nonplane-wave propagation inside a viscous fluid described by Navier-Stokes equations. In our work, we first propose a combined minimum energy and minimum jerk criterion to estimate the dynamic vocal tract movements during speech production. Both theoretical error bound analysis and experimental results show that this method can achieve very close match at the target points and avoid the abrupt change in articulatory trajectory at the same time. Second, a mechanical vocal fold model is used to compute the excitation signal of the vocal tract. The advantage of this model is that it is closely coupled with the vocal tract system based on fundamental aerodynamics. As a result, we can obtain an excitation signal with much more detail than the conventional parametric vocal fold excitation model. Furthermore, strong evidence of source-tract interaction is observed. Finally, we propose a computational model of the fricative and stop types of sounds based on the physical principles of speech production. The advantage of this model is that it uses an exogenous process to model the additional nonsteady and nonlinear effects due to the flow mode, which are ignored by the conventional source- filter speech production model. A recursive algorithm is used to estimate the model parameters. Experimental results show that this model is able to synthesize good quality fricative and stop types of sounds. Based on our dissertation work, we carefully argue that the articulatory speech production model has the potential to flexibly synthesize natural-quality speech sounds and to provide a compact computational model for speech production that can be beneficial to a wide range of areas in speech signal processing.
NASA Astrophysics Data System (ADS)
Nishiura, Takanobu; Nakamura, Satoshi
2002-11-01
It is very important to capture distant-talking speech for a hands-free speech interface with high quality. A microphone array is an ideal candidate for this purpose. However, this approach requires localizing the target talker. Conventional talker localization algorithms in multiple sound source environments not only have difficulty localizing the multiple sound sources accurately, but also have difficulty localizing the target talker among known multiple sound source positions. To cope with these problems, we propose a new talker localization algorithm consisting of two algorithms. One is DOA (direction of arrival) estimation algorithm for multiple sound source localization based on CSP (cross-power spectrum phase) coefficient addition method. The other is statistical sound source identification algorithm based on GMM (Gaussian mixture model) for localizing the target talker position among localized multiple sound sources. In this paper, we particularly focus on the talker localization performance based on the combination of these two algorithms with a microphone array. We conducted evaluation experiments in real noisy reverberant environments. As a result, we confirmed that multiple sound signals can be identified accurately between ''speech'' or ''non-speech'' by the proposed algorithm. [Work supported by ATR, and MEXT of Japan.
Speech in the Junior High School. Michigan Speech Association Curriculum Guide Series, No. 4.
ERIC Educational Resources Information Center
Herman, Deldee; Ratliffe, Sharon
Designed to provide the student with experience in oral communication, this curriculum guide presents a one-semester speech course for junior high school students with "normal" rather than defective speech. The eight units cover speech in social interaction; group discussion and business meetings; demonstrations and reports; creative dramatics;…
Speech vs. singing: infants choose happier sounds
Corbeil, Marieve; Trehub, Sandra E.; Peretz, Isabelle
2013-01-01
Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants' attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4–13 months of age were exposed to happy-sounding infant-directed speech vs. hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children's song spoken vs. sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children's song vs. a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing) was the principal contributor to infant attention, regardless of age. PMID:23805119
4800 B/S speech compression techniques for mobile satellite systems
NASA Technical Reports Server (NTRS)
Townes, S. A.; Barnwell, T. P., III; Rose, R. C.; Gersho, A.; Davidson, G.
1986-01-01
This paper will discuss three 4800 bps digital speech compression techniques currently being investigated for application in the mobile satellite service. These three techniques, vector adaptive predictive coding, vector excitation coding, and the self excited vocoder, are the most promising among a number of techniques being developed to possibly provide near-toll-quality speech compression while still keeping the bit-rate low enough for a power and bandwidth limited satellite service.
Wolfe, Jace; Neumann, Sara; Schafer, Erin; Marsh, Megan; Wood, Mark; Baker, R Stanley
2017-02-01
A number of published studies have demonstrated the benefits of electric-acoustic stimulation (EAS) over conventional electric stimulation for adults with functional low-frequency acoustic hearing and severe-to-profound high-frequency hearing loss. These benefits potentially include better speech recognition in quiet and in noise, better localization, improvements in sound quality, better music appreciation and aptitude, and better pitch recognition. There is, however, a paucity of published reports describing the potential benefits and limitations of EAS for children with functional low-frequency acoustic hearing and severe-to-profound high-frequency hearing loss. The objective of this study was to explore the potential benefits of EAS for children. A repeated measures design was used to evaluate performance differences obtained with EAS stimulation versus acoustic- and electric-only stimulation. Seven users of Cochlear Nucleus Hybrid, Nucleus 24 Freedom, CI512, and CI422 implants were included in the study. Sentence recognition (assayed using the pediatric version of the AzBio sentence recognition test) was evaluated in quiet and at three fixed signal-to-noise ratios (SNR) (0, +5, and +10 dB). Functional hearing performance was also evaluated with the use of questionnaires, including the comparative version of the Speech, Spatial, and Qualities, the Listening Inventory for Education Revised, and the Children's Home Inventory for Listening Difficulties. Speech recognition in noise was typically better with EAS compared to participants' performance with acoustic- and electric-only stimulation, particularly when evaluated at the less favorable SNR. Additionally, in real-world situations, children generally preferred to use EAS compared to electric-only stimulation. Also, the participants' classroom teachers observed better hearing performance in the classroom with the use of EAS. Use of EAS provided better speech recognition in quiet and in noise when compared to performance obtained with use of acoustic- and electric-only stimulation, and children responded favorably to the use of EAS implemented in an integrated sound processor for real-world use. American Academy of Audiology
Precision of working memory for speech sounds.
Joseph, Sabine; Iverson, Paul; Manohar, Sanjay; Fox, Zoe; Scott, Sophie K; Husain, Masud
2015-01-01
Memory for speech sounds is a key component of models of verbal working memory (WM). But how good is verbal WM? Most investigations assess this using binary report measures to derive a fixed number of items that can be stored. However, recent findings in visual WM have challenged such "quantized" views by employing measures of recall precision with an analogue response scale. WM for speech sounds might rely on both continuous and categorical storage mechanisms. Using a novel speech matching paradigm, we measured WM recall precision for phonemes. Vowel qualities were sampled from a formant space continuum. A probe vowel had to be adjusted to match the vowel quality of a target on a continuous, analogue response scale. Crucially, this provided an index of the variability of a memory representation around its true value and thus allowed us to estimate how memories were distorted from the original sounds. Memory load affected the quality of speech sound recall in two ways. First, there was a gradual decline in recall precision with increasing number of items, consistent with the view that WM representations of speech sounds become noisier with an increase in the number of items held in memory, just as for vision. Based on multidimensional scaling (MDS), the level of noise appeared to be reflected in distortions of the formant space. Second, as memory load increased, there was evidence of greater clustering of participants' responses around particular vowels. A mixture model captured both continuous and categorical responses, demonstrating a shift from continuous to categorical memory with increasing WM load. This suggests that direct acoustic storage can be used for single items, but when more items must be stored, categorical representations must be used.
Determining the importance of fundamental hearing aid attributes.
Meister, Hartmut; Lausberg, Isabel; Kiessling, Juergen; Walger, Martin; von Wedel, Hasso
2002-07-01
To determine the importance of fundamental hearing aid attributes and to elicit measures of satisfaction and dissatisfaction. A prospective study based on a survey using a decompositional approach of preference measurement (conjoint analysis). Ear, nose, and throat university hospitals in Cologne and Giessen; various branches of hearing aid dispensers. A random sample of 175 experienced hearing aid users aged 20 to 91 years (mean age, 61 yr) recruited at two different sites. Relative importance of different hearing aid attributes, satisfaction and dissatisfaction with hearing aid attributes. Of the six fundamental hearing aid attributes assessed by the hearing aid users, the two features concerning speech perception attained the highest relative importance (25% speech in quiet, 27% speech in noise). The remaining four attributes (sound quality, handling, feedback, localization) had significantly lower values in a narrow range of 10 to 12%. Comparison of different subgroups of hearing aid wearers based on sociodemographic and user-specific data revealed a large interindividual scatter of the preferences for the attributes. A similar examination with 25 clinicians revealed overestimation of the importance of the attributes commonly associated with problems. Moreover, examination of satisfaction showed that speech in noise was the most frequent source of dissatisfaction (30% of all statements), whereas the subjects were satisfied with speech in quiet. The results emphasize the high importance of attributes related to speech perception. Speech discrimination in noise was the most important but also the most frequent source of negative statements. This attribute will be the outstanding parameter of future developments. Appropriate handling becomes an important factor for elderly subjects. However, because of the large interindividual scatter of data, the preferences of different hearing aid users were hardly predictable, giving evidence of multifactorial influences.
Hornsby, Benjamin W. Y.; Johnson, Earl E.; Picou, Erin
2011-01-01
Objectives The purpose of this study was to examine the effects of degree and configuration of hearing loss on the use of, and benefit from, information in amplified high- and low-frequency speech presented in background noise. Design Sixty-two adults with a wide range of high- and low-frequency sensorineural hearing loss (5–115+ dB HL) participated. To examine the contribution of speech information in different frequency regions, speech understanding in noise was assessed in multiple low- and high-pass filter conditions, as well as a band-pass (713–3534 Hz) and wideband (143–8976 Hz) condition. To increase audibility over a wide frequency range, speech and noise were amplified based on each individual’s hearing loss. A stepwise multiple linear regression approach was used to examine the contribution of several factors to 1) absolute performance in each filter condition and 2) the change in performance with the addition of amplified high- and low-frequency speech components. Results Results from the regression analysis showed that degree of hearing loss was the strongest predictor of absolute performance for low- and high-pass filtered speech materials. In addition, configuration of hearing loss affected both absolute performance for severely low-pass filtered speech and benefit from extending high-frequency (3534–8976 Hz) bandwidth. Specifically, individuals with steeply sloping high-frequency losses made better use of low-pass filtered speech information than individuals with similar low-frequency thresholds but less high-frequency loss. In contrast, given similar high-frequency thresholds, individuals with flat hearing losses received more benefit from extending high-frequency bandwidth than individuals with more sloping losses. Conclusions Consistent with previous work, benefit from speech information in a given frequency region generally decreases as degree of hearing loss in that frequency region increases. However, given a similar degree of loss, the configuration of hearing loss also affects the ability to use speech information in different frequency regions. Except for individuals with steeply sloping high-frequency losses, providing high-frequency amplification (3534–8976 Hz) had either a beneficial effect on, or did not significantly degrade, speech understanding. These findings highlight the importance of extended high-frequency amplification for listeners with a wide range of high-frequency hearing losses, when seeking to maximize intelligibility. PMID:21336138
Speech enhancement on smartphone voice recording
NASA Astrophysics Data System (ADS)
Tris Atmaja, Bagus; Nur Farid, Mifta; Arifianto, Dhany
2016-11-01
Speech enhancement is challenging task in audio signal processing to enhance the quality of targeted speech signal while suppress other noises. In the beginning, the speech enhancement algorithm growth rapidly from spectral subtraction, Wiener filtering, spectral amplitude MMSE estimator to Non-negative Matrix Factorization (NMF). Smartphone as revolutionary device now is being used in all aspect of life including journalism; personally and professionally. Although many smartphones have two microphones (main and rear) the only main microphone is widely used for voice recording. This is why the NMF algorithm widely used for this purpose of speech enhancement. This paper evaluate speech enhancement on smartphone voice recording by using some algorithms mentioned previously. We also extend the NMF algorithm to Kulback-Leibler NMF with supervised separation. The last algorithm shows improved result compared to others by spectrogram and PESQ score evaluation.
High-quality lossy compression: current and future trends
NASA Astrophysics Data System (ADS)
McLaughlin, Steven W.
1995-01-01
This paper is concerned with current and future trends in the lossy compression of real sources such as imagery, video, speech and music. We put all lossy compression schemes into common framework where each can be characterized in terms of three well-defined advantages: cell shape, region shape and memory advantages. We concentrate on image compression and discuss how new entropy constrained trellis-based compressors achieve cell- shape, region-shape and memory gain resulting in high fidelity and high compression.
[Virtual audiovisual talking heads: articulatory data and models--applications].
Badin, P; Elisei, F; Bailly, G; Savariaux, C; Serrurier, A; Tarabalka, Y
2007-01-01
In the framework of experimental phonetics, our approach to the study of speech production is based on the measurement, the analysis and the modeling of orofacial articulators such as the jaw, the face and the lips, the tongue or the velum. Therefore, we present in this article experimental techniques that allow characterising the shape and movement of speech articulators (static and dynamic MRI, computed tomodensitometry, electromagnetic articulography, video recording). We then describe the linear models of the various organs that we can elaborate from speaker-specific articulatory data. We show that these models, that exhibit a good geometrical resolution, can be controlled from articulatory data with a good temporal resolution and can thus permit the reconstruction of high quality animation of the articulators. These models, that we have integrated in a virtual talking head, can produce augmented audiovisual speech. In this framework, we have assessed the natural tongue reading capabilities of human subjects by means of audiovisual perception tests. We conclude by suggesting a number of other applications of talking heads.
Huber, Rainer; Bisitz, Thomas; Gerkmann, Timo; Kiessling, Jürgen; Meister, Hartmut; Kollmeier, Birger
2018-06-01
The perceived qualities of nine different single-microphone noise reduction (SMNR) algorithms were to be evaluated and compared in subjective listening tests with normal hearing and hearing impaired (HI) listeners. Speech samples added with traffic noise or with party noise were processed by the SMNR algorithms. Subjects rated the amount of speech distortions, intrusiveness of background noise, listening effort and overall quality, using a simplified MUSHRA (ITU-R, 2003 ) assessment method. 18 normal hearing and 18 moderately HI subjects participated in the study. Significant differences between the rating behaviours of the two subject groups were observed: While normal hearing subjects clearly differentiated between different SMNR algorithms, HI subjects rated all processed signals very similarly. Moreover, HI subjects rated speech distortions of the unprocessed, noisier signals as being more severe than the distortions of the processed signals, in contrast to normal hearing subjects. It seems harder for HI listeners to distinguish between additive noise and speech distortions or/and they might have a different understanding of the term "speech distortion" than normal hearing listeners have. The findings confirm that the evaluation of SMNR schemes for hearing aids should always involve HI listeners.
DIMENSION-BASED STATISTICAL LEARNING OF VOWELS
Liu, Ran; Holt, Lori L.
2015-01-01
Speech perception depends on long-term representations that reflect regularities of the native language. However, listeners rapidly adapt when speech acoustics deviate from these regularities due to talker idiosyncrasies such as foreign accents and dialects. To better understand these dual aspects of speech perception, we probe native English listeners’ baseline perceptual weighting of two acoustic dimensions (spectral quality and vowel duration) towards vowel categorization and examine how they subsequently adapt to an “artificial accent” that deviates from English norms in the correlation between the two dimensions. At baseline, listeners rely relatively more on spectral quality than vowel duration to signal vowel category, but duration nonetheless contributes. Upon encountering an “artificial accent” in which the spectral-duration correlation is perturbed relative to English language norms, listeners rapidly down-weight reliance on duration. Listeners exhibit this type of short-term statistical learning even in the context of nonwords, confirming that lexical information is not necessary to this form of adaptive plasticity in speech perception. Moreover, learning generalizes to both novel lexical contexts and acoustically-distinct altered voices. These findings are discussed in the context of a mechanistic proposal for how supervised learning may contribute to this type of adaptive plasticity in speech perception. PMID:26280268
Stuttering: Clinical and research update.
Perez, Hector R; Stoeckle, James H
2016-06-01
To provide an update on the epidemiology, genetics, pathophysiology, diagnosis, and treatment of developmental stuttering. The MEDLINE and Cochrane databases were searched for past and recent studies on the epidemiology, genetics, pathophysiology, diagnosis, and treatment of developmental stuttering. Most recommendations are based on small studies, limited-quality evidence, or consensus. Stuttering is a speech disorder, common in persons of all ages, that affects normal fluency and time patterning of speech. Stuttering has been associated with differences in brain anatomy, functioning, and dopamine regulation thought to be due to genetic causes. Attention to making a correct diagnosis or referral in children is important because there is growing consensus that early intervention with speech therapy for children who stutter is critical. For adults, stuttering can be associated with substantial psychosocial morbidity including social anxiety and low quality of life. Pharmacologic treatment has received attention in recent years, but clinical evidence is limited. The mainstay of treatment for children and adults remains speech therapy. A growing body of research has attempted to uncover the pathophysiology of stuttering. Referral for speech therapy remains the best option for children and adults. Copyright© the College of Family Physicians of Canada.
Visual and Auditory Components in the Perception of Asynchronous Audiovisual Speech
Alcalá-Quintana, Rocío
2015-01-01
Research on asynchronous audiovisual speech perception manipulates experimental conditions to observe their effects on synchrony judgments. Probabilistic models establish a link between the sensory and decisional processes underlying such judgments and the observed data, via interpretable parameters that allow testing hypotheses and making inferences about how experimental manipulations affect such processes. Two models of this type have recently been proposed, one based on independent channels and the other using a Bayesian approach. Both models are fitted here to a common data set, with a subsequent analysis of the interpretation they provide about how experimental manipulations affected the processes underlying perceived synchrony. The data consist of synchrony judgments as a function of audiovisual offset in a speech stimulus, under four within-subjects manipulations of the quality of the visual component. The Bayesian model could not accommodate asymmetric data, was rejected by goodness-of-fit statistics for 8/16 observers, and was found to be nonidentifiable, which renders uninterpretable parameter estimates. The independent-channels model captured asymmetric data, was rejected for only 1/16 observers, and identified how sensory and decisional processes mediating asynchronous audiovisual speech perception are affected by manipulations that only alter the quality of the visual component of the speech signal. PMID:27551361
Dziegielewski, Peter T.; Teknos, Theodoros N.; Durmus, Kasim; Old, Matthew; Agrawalm, Amit; Kakarala, Kiran; Marcinow, Anna; Ozer, Enver
2014-01-01
Objective To determine swallowing, speech and quality of life (QOL) outcomes following transoral robotic surgery (TORS) for oropharyngeal squamous cell carcinoma (OPSCC). Design Prospective cohort study. Setting Tertiary care academic comprehensive cancer center. Patients 81 patients with previously untreated OPSCC. Intervention Primary surgical resection via TORS and neck dissection as indicated. Main Outcome Measures Patients were asked to complete the Head and Neck Cancer Inventory (HNCI) pre-operatively and at 3 weeks as well as 3, 6 and 12 months post-operatively. Swallowing ability was assessed by independence from a gastrostomy tube (G-Tube). Clinicopathological and follow-up data were also collected. Results Mean follow-up time was 22.7 months. HNCI response rates at 3 weeks and 3, 6, and 12 months were 79%, 60%, 63%, 67% respectively. There were overall declines in speech, eating, aesthetic, social and overall QOL domains in the early post-operative periods. However, at 1 year post-TORS scores for aesthetic, social and overall QOL remained high. Radiation therapy was negatively correlated with multiple QOL domains (p<0.05), while age > 55 years correlated with lower speech and aesthetic scores (p<0.05). HPV status did not correlate with any QOL domain. G-Tube rates at 6 and 12 months were 24% and 9%, respectively. The extent of TORS (> 1 oropharyngeal site resected) and age > 55 years predicted the need for a G-Tube at any point after TORS (p<0.05). Conclusions Patients with OPSCC treated with TORS maintain a high QOL at 1 year after surgery. Adjuvant treatment and advanced age tend to decrease QOL. PMID:23576186
Rinkel, Rico N; Verdonck-de Leeuw, Irma M; Doornaert, Patricia; Buter, Jan; de Bree, Remco; Langendijk, Johannes A; Aaronson, Neil K; Leemans, C René
2016-07-01
The objective of this study is to assess swallowing and speech outcome after chemoradiation therapy for head and neck cancer, based on the patient-reported outcome measures Swallowing Quality of Life Questionnaire (SWAL-QOL) and Speech Handicap Index (SHI), both provided with cut-off scores. This is a cross-sectional study. Department of Otolaryngology/Head and Neck Surgery of a University Medical Center. Sixty patients, 6 months to 5 years after chemoradiation for head and neck squamous cell carcinoma. Swallowing Quality of Life Questionnaire (SWAL-QOL) and SHI, both validated in Dutch and provided with cut-off scores. Associations were tested between the outcome measures and independent variables (age, gender, tumor stage and site, and radiotherapy technique, time since treatment, comorbidity and food intake). Fifty-two patients returned the SWAL-QOL and 47 the SHI (response rate 87 and 78 %, respectively). Swallowing and speech problems were present in 79 and 55 %, respectively. Normal food intake was noticed in 45, 35 % had a soft diet and 20 % tube feeding. Patients with soft diet and tube feeding reported more swallowing problems compared to patients with normal oral intake. Tumor subsite was significantly associated with swallowing outcome (less problems in larynx/hypopharynx compared to oral/oropharynx). Radiation technique was significantly associated with psychosocial speech problems (less problems in patients treated with IMRT). Swallowing and (to a lesser extent) speech problems in daily life are frequently present after chemoradiation therapy for head and neck cancer. Future prospective studies will give more insight into the course of speech and swallowing problems after chemoradiation and into efficacy of new radiation techniques and swallowing and speech rehabilitation programs.
Engaged listeners: shared neural processing of powerful political speeches.
Schmälzle, Ralf; Häcker, Frank E K; Honey, Christopher J; Hasson, Uri
2015-08-01
Powerful speeches can captivate audiences, whereas weaker speeches fail to engage their listeners. What is happening in the brains of a captivated audience? Here, we assess audience-wide functional brain dynamics during listening to speeches of varying rhetorical quality. The speeches were given by German politicians and evaluated as rhetorically powerful or weak. Listening to each of the speeches induced similar neural response time courses, as measured by inter-subject correlation analysis, in widespread brain regions involved in spoken language processing. Crucially, alignment of the time course across listeners was stronger for rhetorically powerful speeches, especially for bilateral regions of the superior temporal gyri and medial prefrontal cortex. Thus, during powerful speeches, listeners as a group are more coupled to each other, suggesting that powerful speeches are more potent in taking control of the listeners' brain responses. Weaker speeches were processed more heterogeneously, although they still prompted substantially correlated responses. These patterns of coupled neural responses bear resemblance to metaphors of resonance, which are often invoked in discussions of speech impact, and contribute to the literature on auditory attention under natural circumstances. Overall, this approach opens up possibilities for research on the neural mechanisms mediating the reception of entertaining or persuasive messages. © The Author (2015). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Role of the Speech-Language Pathologist (SLP) in the Head and Neck Cancer Team.
Hansen, Kelly; Chenoweth, Marybeth; Thompson, Heather; Strouss, Alexandra
2018-01-01
While treatments for head and neck cancer are aimed at curing patients from disease, they can have significant short- and long-term negative impacts on speech and swallowing functions. Research demonstrates that early and frequent involvement of Speech-Language Pathologists (SLPs) is beneficial to these functions and overall quality of life for head and neck cancer patients. Strategies and tools to optimize communication and safe swallowing are presented in this chapter.
Speech transport for packet telephony and voice over IP
NASA Astrophysics Data System (ADS)
Baker, Maurice R.
1999-11-01
Recent advances in packet switching, internetworking, and digital signal processing technologies have converged to allow realizable practical implementations of packet telephony systems. This paper provides a tutorial on transmission engineering for packet telephony covering the topics of speech coding/decoding, speech packetization, packet data network transport, and impairments which may negatively impact end-to-end system quality. Particular emphasis is placed upon Voice over Internet Protocol given the current popularity and ubiquity of IP transport.
[Vocal recognition in dental and oral radiology].
La Fianza, A; Giorgetti, S; Marelli, P; Campani, R
1993-10-01
Speech reporting benefits by units which can recognize sentences in any natural language in real time. The use of this method in the everyday practice of radiology departments shows its possible application fields. We used the speech recognition method to report orthopantomographic exams in order to evaluate the advantages the method offers to the management and quality of reporting the exams which are difficult to fit in other closed computed reporting systems. Both speech recognition and the conventional reporting method (tape recording and typewriting) were used to report 760 orthopantomographs. The average time needed to make the report, the legibility (or Flesch) index, as adapted for the Italian language, and finally a clinical index (the subjective opinion of 4 odontostomatologists) were evaluated for each exam, with both techniques. Moreover, errors in speech reporting (crude, human and overall errors) were also evaluated. The advantages of speech reporting consisted in the shorter time needed for the report to become available (2.24 vs 2.99 minutes) (p < 0.0005), in the improved Flesch index (30.62 vs 28.9) and in the clinical index. The data obtained from speech reporting in odontostomatologic radiology were useful not only to reduce the mean reporting time of orthopantomographic exams but also to improve report quality by reducing both grammar and transmission mistakes. However, the basic condition for such results to be obtained is the speaker's skills to make a good report.
[Verbal and gestural communication in interpersonal interaction with Alzheimer's disease patients].
Schiaratura, Loris Tamara; Di Pastena, Angela; Askevis-Leherpeux, Françoise; Clément, Sylvain
2015-03-01
Communication can be defined as a verbal and non verbal exchange of thoughts and emotions. While verbal communication deficit in Alzheimer's disease is well documented, very little is known about gestural communication, especially in interpersonal situations. This study examines the production of gestures and its relations with verbal aspects of communication. Three patients suffering from moderately severe Alzheimer's disease were compared to three healthy adults. Each one were given a series of pictures and asked to explain which one she preferred and why. The interpersonal interaction was video recorded. Analyses concerned verbal production (quantity and quality) and gestures. Gestures were either non representational (i.e., gestures of small amplitude punctuating speech or accentuating some parts of utterance) or representational (i.e., referring to the object of the speech). Representational gestures were coded as iconic (depicting of concrete aspects), metaphoric (depicting of abstract meaning) or deictic (pointing toward an object). In comparison with healthy participants, patients revealed a decrease in quantity and quality of speech. Nevertheless, their production of gestures was always present. This pattern is in line with the conception that gestures and speech depend on different communicational systems and look inconsistent with the assumption of a parallel dissolution of gesture and speech. Moreover, analyzing the articulation between verbal and gestural dimensions suggests that representational gestures may compensate for speech deficits. It underlines the importance for the role of gestures in maintaining interpersonal communication.
Jeon, Jin Yong; Hong, Joo Young; Jang, Hyung Suk; Kim, Jae Hyeon
2015-12-01
It is necessary to consider not only annoyance of interior noises but also speech privacy to achieve acoustic comfort in a passenger car of a high-speed train because speech from other passengers can be annoying. This study aimed to explore an optimal acoustic environment to satisfy speech privacy and reduce annoyance in a passenger car. Two experiments were conducted using speech sources and compartment noise of a high speed train with varying speech-to-noise ratios (SNRA) and background noise levels (BNL). Speech intelligibility was tested in experiment I, and in experiment II, perceived speech privacy, annoyance, and acoustic comfort of combined sounds with speech and background noise were assessed. The results show that speech privacy and annoyance were significantly influenced by the SNRA. In particular, the acoustic comfort was evaluated as acceptable when the SNRA was less than -6 dB for both speech privacy and noise annoyance. In addition, annoyance increased significantly as the BNL exceeded 63 dBA, whereas the effect of the background-noise level on the speech privacy was not significant. These findings suggest that an optimal level of interior noise in a passenger car might exist between 59 and 63 dBA, taking normal speech levels into account.
Effects of human fatigue on speech signals
NASA Astrophysics Data System (ADS)
Stamoulis, Catherine
2004-05-01
Cognitive performance may be significantly affected by fatigue. In the case of critical personnel, such as pilots, monitoring human fatigue is essential to ensure safety and success of a given operation. One of the modalities that may be used for this purpose is speech, which is sensitive to respiratory changes and increased muscle tension of vocal cords, induced by fatigue. Age, gender, vocal tract length, physical and emotional state may significantly alter speech intensity, duration, rhythm, and spectral characteristics. In addition to changes in speech rhythm, fatigue may also affect the quality of speech, such as articulation. In a noisy environment, detecting fatigue-related changes in speech signals, particularly subtle changes at the onset of fatigue, may be difficult. Therefore, in a performance-monitoring system, speech parameters which are significantly affected by fatigue need to be identified and extracted from input signals. For this purpose, a series of experiments was performed under slowly varying cognitive load conditions and at different times of the day. The results of the data analysis are presented here.
Acoustic characteristics of voice after severe traumatic brain injury.
McHenry, M
2000-07-01
To describe the acoustic characteristics of voice in individuals with motor speech disorders after traumatic brain injury (TBI). Prospective study of 100 individuals with TBI based on consecutive referrals for motor speech evaluations. Subjects were audio tape-recorded while producing sustained vowels and single word and sentence intelligibility tests. Laryngeal airway resistance was estimated, and voice quality was rated perceptually. None of the subjects evidenced vocal parameters within normal limits. The most frequently occurring abnormal parameter across subjects was amplitude perturbation, followed by voice turbulence index. Twenty-three percent of subjects evidenced deviation in all five parameters measured. The perceptual ratings of breathiness were significantly correlated with both the amplitude perturbation quotient and the noise-to-harmonics ratio. Vocal quality deviation is common in motor speech disorders after TBI and may impact intelligibility.
A Smartphone Application for Customized Frequency Table Selection in Cochlear Implants.
Jethanamest, Daniel; Azadpour, Mahan; Zeman, Annette M; Sagi, Elad; Svirsky, Mario A
2017-09-01
A novel smartphone-based software application can facilitate self-selection of frequency allocation tables (FAT) in postlingually deaf cochlear implant (CI) users. CIs use FATs to represent the tonotopic organization of a normal cochlea. Current CI fitting methods typically use a standard FAT for all patients regardless of individual differences in cochlear size and electrode location. In postlingually deaf patients, different amounts of mismatch can result between the frequency-place function they experienced when they had normal hearing and the frequency-place function that results from the standard FAT. For some CI users, an alternative FAT may enhance sound quality or speech perception. Currently, no widely available tools exist to aid real-time selection of different FATs. This study aims to develop a new smartphone tool for this purpose and to evaluate speech perception and sound quality measures in a pilot study of CI subjects using this application. A smartphone application for a widely available mobile platform (iOS) was developed to serve as a preprocessor of auditory input to a clinical CI speech processor and enable interactive real-time selection of FATs. The application's output was validated by measuring electrodograms for various inputs. A pilot study was conducted in six CI subjects. Speech perception was evaluated using word recognition tests. All subjects successfully used the portable application with their clinical speech processors to experience different FATs while listening to running speech. The users were all able to select one table that they judged provided the best sound quality. All subjects chose a FAT different from the standard FAT in their everyday clinical processor. Using the smartphone application, the mean consonant-nucleus-consonant score with the default FAT selection was 28.5% (SD 16.8) and 29.5% (SD 16.4) when using a self-selected FAT. A portable smartphone application enables CI users to self-select frequency allocation tables in real time. Even though the self-selected FATs that were deemed to have better sound quality were only tested acutely (i.e., without long-term experience with them), speech perception scores were not inferior to those obtained with the clinical FATs. This software application may be a valuable tool for improving future methods of CI fitting.
An algorithm that improves speech intelligibility in noise for normal-hearing listeners.
Kim, Gibak; Lu, Yang; Hu, Yi; Loizou, Philipos C
2009-09-01
Traditional noise-suppression algorithms have been shown to improve speech quality, but not speech intelligibility. Motivated by prior intelligibility studies of speech synthesized using the ideal binary mask, an algorithm is proposed that decomposes the input signal into time-frequency (T-F) units and makes binary decisions, based on a Bayesian classifier, as to whether each T-F unit is dominated by the target or the masker. Speech corrupted at low signal-to-noise ratio (SNR) levels (-5 and 0 dB) using different types of maskers is synthesized by this algorithm and presented to normal-hearing listeners for identification. Results indicated substantial improvements in intelligibility (over 60% points in -5 dB babble) over that attained by human listeners with unprocessed stimuli. The findings from this study suggest that algorithms that can estimate reliably the SNR in each T-F unit can improve speech intelligibility.
Coppens-Hofman, Marjolein C.; Terband, Hayo; Snik, Ad F.M.; Maassen, Ben A.M.
2017-01-01
Purpose Adults with intellectual disabilities (ID) often show reduced speech intelligibility, which affects their social interaction skills. This study aims to establish the main predictors of this reduced intelligibility in order to ultimately optimise management. Method Spontaneous speech and picture naming tasks were recorded in 36 adults with mild or moderate ID. Twenty-five naïve listeners rated the intelligibility of the spontaneous speech samples. Performance on the picture-naming task was analysed by means of a phonological error analysis based on expert transcriptions. Results The transcription analyses showed that the phonemic and syllabic inventories of the speakers were complete. However, multiple errors at the phonemic and syllabic level were found. The frequencies of specific types of errors were related to intelligibility and quality ratings. Conclusions The development of the phonemic and syllabic repertoire appears to be completed in adults with mild-to-moderate ID. The charted speech difficulties can be interpreted to indicate speech motor control and planning difficulties. These findings may aid the development of diagnostic tests and speech therapies aimed at improving speech intelligibility in this specific group. PMID:28118637
Relationship between perceived politeness and spectral characteristics of voice
NASA Astrophysics Data System (ADS)
Ito, Mika
2005-04-01
This study investigates the role of voice quality in perceiving politeness under conditions of varying relative social status among Japanese male speakers. The work focuses on four important methodological issues: experimental control of sociolinguistic aspects, eliciting natural spontaneous speech, obtaining recording quality suitable for voice quality analysis, and assessment of glottal characteristics through the use of non-invasive direct measurements of the speech spectrum. To obtain natural, unscripted utterances, the speech data were collected with a Map Task. This methodology allowed us to study the effect of manipulating relative social status among participants in the same community. We then computed the relative amplitudes of harmonics and formant peaks in spectra obtained from the Map Task recordings. Finally, an experiment was conducted to observe the alignment between acoustic measures and the perceived politeness of the voice samples. The results suggest that listeners' perceptions of politeness are determined by spectral characteristics of speakers, in particular, spectral tilts obtained by computing the difference in amplitude between the first harmonic and the third formant.
Low-income fathers' speech to toddlers during book reading versus toy play.
Salo, Virginia C; Rowe, Meredith L; Leech, Kathryn A; Cabrera, Natasha J
2016-11-01
Fathers' child-directed speech across two contexts was examined. Father-child dyads from sixty-nine low-income families were videotaped interacting during book reading and toy play when children were 2;0. Fathers used more diverse vocabulary and asked more questions during book reading while their mean length of utterance was longer during toy play. Variation in these specific characteristics of fathers' speech that differed across contexts was also positively associated with child vocabulary skill measured on the MacArthur-Bates Communicative Development Inventory. Results are discussed in terms of how different contexts elicit specific qualities of child-directed speech that may promote language use and development.
Performance of a low data rate speech codec for land-mobile satellite communications
NASA Technical Reports Server (NTRS)
Gersho, Allen; Jedrey, Thomas C.
1990-01-01
In an effort to foster the development of new technologies for the emerging land mobile satellite communications services, JPL funded two development contracts in 1984: one to the Univ. of Calif., Santa Barbara and the other to the Georgia Inst. of Technology, to develop algorithms and real time hardware for near toll quality speech compression at 4800 bits per second. Both universities have developed and delivered speech codecs to JPL, and the UCSB codec was extensively tested by JPL in a variety of experimental setups. The basic UCSB speech codec algorithms and the test results of the various experiments performed with this codec are presented.
Masked speech perception across the adult lifespan: Impact of age and hearing impairment.
Goossens, Tine; Vercammen, Charlotte; Wouters, Jan; van Wieringen, Astrid
2017-02-01
As people grow older, speech perception difficulties become highly prevalent, especially in noisy listening situations. Moreover, it is assumed that speech intelligibility is more affected in the event of background noises that induce a higher cognitive load, i.e., noises that result in informational versus energetic masking. There is ample evidence showing that speech perception problems in aging persons are partly due to hearing impairment and partly due to age-related declines in cognition and suprathreshold auditory processing. In order to develop effective rehabilitation strategies, it is indispensable to know how these different degrading factors act upon speech perception. This implies disentangling effects of hearing impairment versus age and examining the interplay between both factors in different background noises of everyday settings. To that end, we investigated open-set sentence identification in six participant groups: a young (20-30 years), middle-aged (50-60 years), and older cohort (70-80 years), each including persons who had normal audiometric thresholds up to at least 4 kHz, on the one hand, and persons who were diagnosed with elevated audiometric thresholds, on the other hand. All participants were screened for (mild) cognitive impairment. We applied stationary and amplitude modulated speech-weighted noise, which are two types of energetic maskers, and unintelligible speech, which causes informational masking in addition to energetic masking. By means of these different background noises, we could look into speech perception performance in listening situations with a low and high cognitive load, respectively. Our results indicate that, even when audiometric thresholds are within normal limits up to 4 kHz, irrespective of threshold elevations at higher frequencies, and there is no indication of even mild cognitive impairment, masked speech perception declines by middle age and decreases further on to older age. The impact of hearing impairment is as detrimental for young and middle-aged as it is for older adults. When the background noise becomes cognitively more demanding, there is a larger decline in speech perception, due to age or hearing impairment. Hearing impairment seems to be the main factor underlying speech perception problems in background noises that cause energetic masking. However, in the event of informational masking, which induces a higher cognitive load, age appears to explain a significant part of the communicative impairment as well. We suggest that the degrading effect of age is mediated by deficiencies in temporal processing and central executive functions. This study may contribute to the improvement of auditory rehabilitation programs aiming to prevent aging persons from missing out on conversations, which, in turn, will improve their quality of life. Copyright © 2016 Elsevier B.V. All rights reserved.
Rusz, Jan; Bonnet, Cecilia; Klempíř, Jiří; Tykalová, Tereza; Baborová, Eva; Novotný, Michal; Rulseh, Aaron; Růžička, Evžen
2015-01-01
Although speech disorder is frequently an early and prominent clinical feature of Parkinson's disease (PD) as well as atypical parkinsonian syndromes (APS) such as progressive supranuclear palsy (PSP) and multiple system atrophy (MSA), there is a lack of objective and quantitative evidence to verify whether any specific speech characteristics allow differentiation between PD, PSP and MSA. Speech samples were acquired from 77 subjects including 15 PD, 12 PSP, 13 MSA and 37 healthy controls. The accurate differential diagnosis of dysarthria subtypes was based on the quantitative acoustic analysis of 16 speech dimensions. Dysarthria was uniformly present in all parkinsonian patients but was more severe in PSP and MSA than in PD. Whilst PD speakers manifested pure hypokinetic dysarthria, ataxic components were more affected in MSA whilst PSP subjects demonstrated severe deficits in hypokinetic and spastic elements of dysarthria. Dysarthria in PSP was dominated by increased dysfluency, decreased slow rate, inappropriate silences, deficits in vowel articulation and harsh voice quality whereas MSA by pitch fluctuations, excess intensity variations, prolonged phonemes, vocal tremor and strained-strangled voice quality. Objective speech measurements were able to discriminate between APS and PD with 95% accuracy and between PSP and MSA with 75% accuracy. Dysarthria severity in APS was related to overall disease severity (r = 0.54, p = 0.006). Dysarthria with various combinations of hypokinetic, spastic and ataxic components reflects differing pathophysiology in PD, PSP and MSA. Thus, motor speech examination may provide useful information in the evaluation of these diseases with similar manifestations.
Somanath, Keerthan; Mau, Ted
2016-11-01
(1) To develop an automated algorithm to analyze electroglottographic (EGG) signal in continuous dysphonic speech, and (2) to identify EGG waveform parameters that correlate with the auditory-perceptual quality of strain in the speech of patients with adductor spasmodic dysphonia (ADSD). Software development with application in a prospective controlled study. EGG was recorded from 12 normal speakers and 12 subjects with ADSD reading excerpts from the Rainbow Passage. Data were processed by a new algorithm developed with the specific goal of analyzing continuous dysphonic speech. The contact quotient, pulse width, a new parameter peak skew, and various contact closing slope quotient and contact opening slope quotient measures were extracted. EGG parameters were compared between normal and ADSD speech. Within the ADSD group, intra-subject comparison was also made between perceptually strained syllables and unstrained syllables. The opening slope quotient SO7525 distinguished strained syllables from unstrained syllables in continuous speech within individual subjects with ADSD. The standard deviations, but not the means, of contact quotient, EGGW50, peak skew, and SO7525 were different between normal and ADSD speakers. The strain-stress pattern in continuous speech can be visualized as color gradients based on the variation of EGG parameter values. EGG parameters may provide a within-subject measure of vocal strain and serve as a marker for treatment response. The addition of EGG to multidimensional assessment may lead to improved characterization of the voice disturbance in ADSD. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Somanath, Keerthan; Mau, Ted
2016-01-01
Objectives (1) To develop an automated algorithm to analyze electroglottographic (EGG) signal in continuous, dysphonic speech, and (2) to identify EGG waveform parameters that correlate with the auditory-perceptual quality of strain in the speech of patients with adductor spasmodic dysphonia (ADSD). Study Design Software development with application in a prospective controlled study. Methods EGG was recorded from 12 normal speakers and 12 subjects with ADSD reading excerpts from the Rainbow Passage. Data were processed by a new algorithm developed with the specific goal of analyzing continuous dysphonic speech. The contact quotient (CQ), pulse width (EGGW), a new parameter peak skew, and various contact closing slope quotient (SC) and contact opening slope quotient (SO) measures were extracted. EGG parameters were compared between normal and ADSD speech. Within the ADSD group, intra-subject comparison was also made between perceptually strained syllables and unstrained syllables. Results The opening slope quotient SO7525 distinguished strained syllables from unstrained syllables in continuous speech within individual ADSD subjects. The standard deviations, but not the means, of CQ, EGGW50, peak skew, and SO7525 were different between normal and ADSD speakers. The strain-stress pattern in continuous speech can be visualized as color gradients based on the variation of EGG parameter values. Conclusions EGG parameters may provide a within-subject measure of vocal strain and serve as a marker for treatment response. The addition of EGG to multi-dimensional assessment may lead to improved characterization of the voice disturbance in ADSD. PMID:26739857
Keep Your Windows Open and Mirrors Polished: On Quality Education in a Changing America
ERIC Educational Resources Information Center
Katz, Lucinda Lee
2011-01-01
Lucinda Lee Katz, head of Marin Country Day School (California), received the 2009 NAIS Diversity Leadership Award. This article presents an edited excerpt of her acceptance speech. In this speech, she outlines what is necessary to move school communities ahead in one's diversity work.
Noise Hampers Children’s Expressive Word Learning
Riley, Kristine Grohne; McGregor, Karla K.
2013-01-01
Purpose To determine the effects of noise and speech style on word learning in typically developing school-age children. Method Thirty-one participants ages 9;0 (years; months) to 10;11 attempted to learn 2 sets of 8 novel words and their referents. They heard all of the words 13 times each within meaningful narrative discourse. Signal-to-noise ratio (noise vs. quiet) and speech style (plain vs. clear) were manipulated such that half of the children heard the new words in broadband white noise and half heard them in quiet; within those conditions, each child heard one set of words produced in a plain speech style and another set in a clear speech style. Results Children who were trained in quiet learned to produce the word forms more accurately than those who were trained in noise. Clear speech resulted in more accurate word form productions than plain speech, whether the children had learned in noise or quiet. Learning from clear speech in noise and plain speech in quiet produced comparable results. Conclusion Noise limits expressive vocabulary growth in children, reducing the quality of word form representation in the lexicon. Clear speech input can aid expressive vocabulary growth in children, even in noisy environments. PMID:22411494
Velopharyngeal Dysfunction Evaluation and Treatment.
Meier, Jeremy D; Muntz, Harlan R
2016-11-01
Velopharyngeal dysfunction (VPD) can significantly impair a child's quality of life and may have lasting consequences if inadequately treated. This article reviews the work-up and management options for patients with VPD. An accurate perceptual speech analysis, nasometry, and nasal endoscopy are helpful to appropriately evaluate patients with VPD. Treatment options include nonsurgical management with speech therapy or a speech bulb and surgical approaches including double-opposing Z-plasty, sphincter pharyngoplasty, pharyngeal flap, or posterior wall augmentation. Copyright © 2016 Elsevier Inc. All rights reserved.
The Speech multi features fusion perceptual hash algorithm based on tensor decomposition
NASA Astrophysics Data System (ADS)
Huang, Y. B.; Fan, M. H.; Zhang, Q. Y.
2018-03-01
With constant progress in modern speech communication technologies, the speech data is prone to be attacked by the noise or maliciously tampered. In order to make the speech perception hash algorithm has strong robustness and high efficiency, this paper put forward a speech perception hash algorithm based on the tensor decomposition and multi features is proposed. This algorithm analyses the speech perception feature acquires each speech component wavelet packet decomposition. LPCC, LSP and ISP feature of each speech component are extracted to constitute the speech feature tensor. Speech authentication is done by generating the hash values through feature matrix quantification which use mid-value. Experimental results showing that the proposed algorithm is robust for content to maintain operations compared with similar algorithms. It is able to resist the attack of the common background noise. Also, the algorithm is highly efficiency in terms of arithmetic, and is able to meet the real-time requirements of speech communication and complete the speech authentication quickly.
Speech therapy for children with dysarthria acquired before three years of age.
Pennington, Lindsay; Parker, Naomi K; Kelly, Helen; Miller, Nick
2016-07-18
Children with motor impairments often have the motor speech disorder dysarthria, a condition which effects the tone, strength and co-ordination of any or all of the muscles used for speech. Resulting speech difficulties can range from mild, with slightly slurred articulation and breathy voice, to profound, with an inability to produce any recognisable words. Children with dysarthria are often prescribed communication aids to supplement their natural forms of communication. However, there is variation in practice regarding the provision of therapy focusing on voice and speech production. Descriptive studies have suggested that therapy may improve speech, but its effectiveness has not been evaluated. To assess whether any speech and language therapy intervention aimed at improving the speech of children with dysarthria is more effective in increasing children's speech intelligibility or communicative participation than no intervention at all , and to compare the efficacy of individual types of speech language therapy in improving the speech intelligibility or communicative participation of children with dysarthria. We searched the Cochrane Central Register of Controlled Trials (CENTRAL; 2015 , Issue 7 ), MEDLINE, EMBASE, CINAHL , LLBA, ERIC, PsychInfo, Web of Science, Scopus, UK National Research Register and Dissertation Abstracts up to July 2015, handsearched relevant journals published between 1980 and July 2015, and searched proceedings of relevant conferences between 1996 to 2015. We placed no restrictions on the language or setting of the studies. A previous version of this review considered studies published up to April 2009. In this update we searched for studies published from April 2009 to July 2015. We considered randomised controlled trials and studies using quasi-experimental designs in which children were allocated to groups using non-random methods. One author (LP) conducted searches of all databases, journals and conference reports. All searches included a reliability check in which a second review author independently checked a random sample comprising 15% of all identified reports. We planned that two review authors would independently assess the quality and extract data from eligible studies. No randomised controlled trials or group studies were identified. This review found no evidence from randomised trials of the effectiveness of speech and language therapy interventions to improve the speech of children with early acquired dysarthria. Rigorous, fully powered randomised controlled trials are needed to investigate if the positive changes in children's speech observed in phase I and phase II studies are generalisable to the population of children with early acquired dysarthria served by speech and language therapy services. Research should examine change in children's speech production and intelligibility. It must also investigate children's participation in social and educational activities, and their quality of life, as well as the cost and acceptability of interventions.
ERIC Educational Resources Information Center
Neldon, Gayle B.
2009-01-01
Evidence-based practice (EBP) is a strategy for the provision of high quality health care. The use of journals to document clinical experiences and reflection has been used in speech-language pathology as well as nursing and psychology. This study uses qualitative analysis to study what AuD students learn about evidence-based practice from writing…
Barsties, Ben; Maryn, Youri
2016-07-01
The Acoustic Voice Quality Index (AVQI) is an objective method to quantify the severity of overall voice quality in concatenated continuous speech and sustained phonation segments. Recently, AVQI was successfully modified to be more representative and ecologically valid because the internal consistency of AVQI was balanced out through equal proportion of the 2 speech types. The present investigation aims to explore its external validation in a large data set. An expert panel of 12 speech-language therapists rated the voice quality of 1058 concatenated voice samples varying from normophonia to severe dysphonia. The Spearman rank-order correlation coefficients (r) were used to measure concurrent validity. The AVQI's diagnostic accuracy was evaluated with several estimates of its receiver operating characteristics (ROC). Finally, 8 of the 12 experts were chosen because of reliability criteria. A strong correlation was identified between AVQI and auditoryperceptual rating (r = 0.815, P = .000). It indicated that 66.4% of the auditory-perceptual rating's variation was explained by AVQI. Additionally, the ROC results showed again the best diagnostic outcome at a threshold of AVQI = 2.43. This study highlights external validation and diagnostic precision of the AVQI version 03.01 as a robust and ecologically valid measurement to objectify voice quality. © The Author(s) 2016.
Quality of life in patients with obturator prostheses.
Riaz, Nabeela; Warriach, Riaz Ahmad
2010-01-01
Oral cancer has a profound impact on the quality of life for patients and their families. Functionally, the mouth is an important organ for speech, swallowing, chewing, taste and salivation. These functions become compromised due to surgical ablation of the tumour. Obturator prosthesis is a common prosthdontic rehabilitative option for maxillectomy patients. The purpose of this study was to investigate how patients with maxillofacial defects evaluate their quality of life after maxillectomy and prosthodontic therapy with obturator prostheses. Thirty patients were included in the study (11 female, 19 male). The patients were interviewed by using a standardised questionnaire developed by University of Washington (UW-QOL). The detailed questionnaire was adjusted for obturator patients and internalised most parts of obturator functioning scale (OFS). Quality of life after prosthodontic therapy with obturator prostheses was 54 +/- 22.9% on average. Functioning of the obturator prosthesis, impairment of ingestion, speech and appearance, the extent of therapy, and the existence of pain had significant impact on the quality of life (p<0.005). Orofacial rehabilitation of patients with maxillofacial defects using obturator prostheses is an appropriate treatment modality. To improve the situation of patients prior to and after maxillectomy sufficient information about the treatment, adequate psychological care and speech therapy should be provided.
Wołk, Agnieszka; Glinkowski, Wojciech
2017-01-01
People with speech, hearing, or mental impairment require special communication assistance, especially for medical purposes. Automatic solutions for speech recognition and voice synthesis from text are poor fits for communication in the medical domain because they are dependent on error-prone statistical models. Systems dependent on manual text input are insufficient. Recently introduced systems for automatic sign language recognition are dependent on statistical models as well as on image and gesture quality. Such systems remain in early development and are based mostly on minimal hand gestures unsuitable for medical purposes. Furthermore, solutions that rely on the Internet cannot be used after disasters that require humanitarian aid. We propose a high-speed, intuitive, Internet-free, voice-free, and text-free tool suited for emergency medical communication. Our solution is a pictogram-based application that provides easy communication for individuals who have speech or hearing impairment or mental health issues that impair communication, as well as foreigners who do not speak the local language. It provides support and clarification in communication by using intuitive icons and interactive symbols that are easy to use on a mobile device. Such pictogram-based communication can be quite effective and ultimately make people's lives happier, easier, and safer. PMID:29230254
The Sound of Feelings: Electrophysiological Responses to Emotional Speech in Alexithymia
Goerlich, Katharina Sophia; Aleman, André; Martens, Sander
2012-01-01
Background Alexithymia is a personality trait characterized by difficulties in the cognitive processing of emotions (cognitive dimension) and in the experience of emotions (affective dimension). Previous research focused mainly on visual emotional processing in the cognitive alexithymia dimension. We investigated the impact of both alexithymia dimensions on electrophysiological responses to emotional speech in 60 female subjects. Methodology During unattended processing, subjects watched a movie while an emotional prosody oddball paradigm was presented in the background. During attended processing, subjects detected deviants in emotional prosody. The cognitive alexithymia dimension was associated with a left-hemisphere bias during early stages of unattended emotional speech processing, and with generally reduced amplitudes of the late P3 component during attended processing. In contrast, the affective dimension did not modulate unattended emotional prosody perception, but was associated with reduced P3 amplitudes during attended processing particularly to emotional prosody spoken in high intensity. Conclusions Our results provide evidence for a dissociable impact of the two alexithymia dimensions on electrophysiological responses during the attended and unattended processing of emotional prosody. The observed electrophysiological modulations are indicative of a reduced sensitivity to the emotional qualities of speech, which may be a contributing factor to problems in interpersonal communication associated with alexithymia. PMID:22615853
Wołk, Krzysztof; Wołk, Agnieszka; Glinkowski, Wojciech
2017-01-01
People with speech, hearing, or mental impairment require special communication assistance, especially for medical purposes. Automatic solutions for speech recognition and voice synthesis from text are poor fits for communication in the medical domain because they are dependent on error-prone statistical models. Systems dependent on manual text input are insufficient. Recently introduced systems for automatic sign language recognition are dependent on statistical models as well as on image and gesture quality. Such systems remain in early development and are based mostly on minimal hand gestures unsuitable for medical purposes. Furthermore, solutions that rely on the Internet cannot be used after disasters that require humanitarian aid. We propose a high-speed, intuitive, Internet-free, voice-free, and text-free tool suited for emergency medical communication. Our solution is a pictogram-based application that provides easy communication for individuals who have speech or hearing impairment or mental health issues that impair communication, as well as foreigners who do not speak the local language. It provides support and clarification in communication by using intuitive icons and interactive symbols that are easy to use on a mobile device. Such pictogram-based communication can be quite effective and ultimately make people's lives happier, easier, and safer.
Brain Volume Differences Associated With Hearing Impairment in Adults
Vriend, Chris; Heslenfeld, Dirk J.; Versfeld, Niek J.; Kramer, Sophia E.
2018-01-01
Speech comprehension depends on the successful operation of a network of brain regions. Processing of degraded speech is associated with different patterns of brain activity in comparison with that of high-quality speech. In this exploratory study, we studied whether processing degraded auditory input in daily life because of hearing impairment is associated with differences in brain volume. We compared T1-weighted structural magnetic resonance images of 17 hearing-impaired (HI) adults with those of 17 normal-hearing (NH) controls using a voxel-based morphometry analysis. HI adults were individually matched with NH adults based on age and educational level. Gray and white matter brain volumes were compared between the groups by region-of-interest analyses in structures associated with speech processing, and by whole-brain analyses. The results suggest increased gray matter volume in the right angular gyrus and decreased white matter volume in the left fusiform gyrus in HI listeners as compared with NH ones. In the HI group, there was a significant correlation between hearing acuity and cluster volume of the gray matter cluster in the right angular gyrus. This correlation supports the link between partial hearing loss and altered brain volume. The alterations in volume may reflect the operation of compensatory mechanisms that are related to decoding meaning from degraded auditory input. PMID:29557274
Simultaneous F 0-F 1 modifications of Arabic for the improvement of natural-sounding
NASA Astrophysics Data System (ADS)
Ykhlef, F.; Bensebti, M.
2013-03-01
Pitch (F 0) modification is one of the most important problems in the area of speech synthesis. Several techniques have been developed in the literature to achieve this goal. The main restrictions of these techniques are in the modification range and the synthesised speech quality, intelligibility and naturalness. The control of formants in a spoken language can significantly improve the naturalness of the synthesised speech. This improvement is mainly dependent on the control of the first formant (F 1). Inspired by this observation, this article proposes a new approach that modifies both F 0 and F 1 of Arabic voiced sounds in order to improve the naturalness of the pitch shifted speech. The developed strategy takes a parallel processing approach, in which the analysis segments are decomposed into sub-bands in the wavelet domain, modified in the desired sub-band by using a resampling technique and reconstructed without affecting the remained sub-bands. Pitch marking and voicing detection are performed in the frequency decomposition step based on the comparison of the multi-level approximation and detail signals. The performance of the proposed technique is evaluated by listening tests and compared to the pitch synchronous overlap and add (PSOLA) technique in the third approximation level. Experimental results have shown that the manipulation in the wavelet domain of F 0 in conjunction with F 1 guarantees natural-sounding of the synthesised speech compared to the classical pitch modification technique. This improvement was appropriate for high pitch modifications.
Social Skills and Social Acceptance in Children with Anxiety Disorders.
Scharfstein, Lindsay A; Beidel, Deborah C
2015-01-01
Whereas much is known about the deficits in social behaviors and social competence in youth with social anxiety disorder (SAD), less is known about those characteristics among youth with generalized anxiety disorder (GAD). This study aimed to better elucidate the social repertoire and peer acceptance of youth with SAD and youth with GAD, relative to normal control (NC) youth. The sample consisted of 58 primarily Caucasian children, ages 6 to 13 years: 20 SAD (12 female), 18 GAD (12 female), and 20 NC (9 female). Diagnoses were based on Anxiety Disorders Interview Schedule for DSM-IV: Children and Parent Versions interviews. A multimodal assessment strategy included parent and child reports, observer ratings of social performance, computer-based analysis of vocal qualities of speech, and peer ratings of likeability and friendship potential. Whereas self- and parental report did not differentiate the two diagnostic groups, differences on observable behaviors were apparent. Children with SAD exhibited anxious speech patterns, extended speech latencies, a paucity of speech, few spontaneous vocalizations, and ineffective social responses; they were perceived by peers as less likeable and socially desirable. Children with GAD had typical speech patterns and were well liked by their peers but displayed fewer spontaneous comments and questions than NC children. Parent and child reports are less sensitive to what could be important differences in social skill between youth with SAD and GAD. Direct observations, computer-based measures of speech quality, and peer ratings identify specific group differences, suggesting the need for a comprehensive evaluation to inform treatment planning.
Pisoni, David B; Kronenberger, William G; Roman, Adrienne S; Geers, Ann E
2011-02-01
Conventional assessments of outcomes in deaf children with cochlear implants (CIs) have focused primarily on endpoint or product measures of speech and language. Little attention has been devoted to understanding the basic underlying core neurocognitive factors involved in the development and processing of speech and language. In this study, we examined the development of factors related to the quality of phonological information in immediate verbal memory, including immediate memory capacity and verbal rehearsal speed, in a sample of deaf children after >10 yrs of CI use and assessed the correlations between these two process measures and a set of speech and language outcomes. Of an initial sample of 180 prelingually deaf children with CIs assessed at ages 8 to 9 yrs after 3 to 7 yrs of CI use, 112 returned for testing again in adolescence after 10 more years of CI experience. In addition to completing a battery of conventional speech and language outcome measures, subjects were administered the Wechsler Intelligence Scale for Children-III Digit Span subtest to measure immediate verbal memory capacity. Sentence durations obtained from the McGarr speech intelligibility test were used as a measure of verbal rehearsal speed. Relative to norms for normal-hearing children, Digit Span scores were well below average for children with CIs at both elementary and high school ages. Improvement was observed over the 8-yr period in the mean longest digit span forward score but not in the mean longest digit span backward score. Longest digit span forward scores at ages 8 to 9 yrs were significantly correlated with all speech and language outcomes in adolescence, but backward digit spans correlated significantly only with measures of higher-order language functioning over that time period. While verbal rehearsal speed increased for almost all subjects between elementary grades and high school, it was still slower than the rehearsal speed obtained from a control group of normal-hearing adolescents. Verbal rehearsal speed at ages 8 to 9 yrs was also found to be strongly correlated with speech and language outcomes and Digit Span scores in adolescence. Despite improvement after 8 additional years of CI use, measures of immediate verbal memory capacity and verbal rehearsal speed, which reflect core fundamental information processing skills associated with representational efficiency and information processing capacity, continue to be delayed in children with CIs relative to NH peers. Furthermore, immediate verbal memory capacity and verbal rehearsal speed at 8 to 9 yrs of age were both found to predict speech and language outcomes in adolescence, demonstrating the important contribution of these processing measures for speech-language development in children with CIs. Understanding the relations between these core underlying processes and speech-language outcomes in children with CIs may help researchers to develop new approaches to intervention and treatment of deaf children who perform poorly with their CIs. Moreover, this knowledge could be used for early identification of deaf children who may be at high risk for poor speech and language outcomes after cochlear implantation as well as for the development of novel targeted interventions that focus selectively on these core elementary information processing variables.
Van Lierde, Kristiane M; D'haeseleer, Evelien; Wuyts, Floris L; De Ley, Sophia; Geldof, Ruben; De Vuyst, Julie; Sofie, Claeys
2010-09-01
The purpose of the present cross-sectional study was to determine the objective vocal quality and the vocal characteristics (vocal risk factors, vocal and corporal complaints) in 197 female students in speech-language pathology during the 4 years of study. The objective vocal quality was measured by means of the Dysphonia Severity Index (DSI). Perceptual voice assessment, the Voice Handicap Index (VHI), questionnaires addressing vocal risks, and vocal and corporal complaints during and/or after voice usage were performed. Speech-language pathology (SLP) students have a borderline vocal quality corresponding to a DSI% of 68. The analysis of variance revealed no significant change of the objective vocal quality between the first bachelor year and the master year. No psychosocial handicapping effect of the voice was observed by means of the VHI total, though there was an effect at the functional VHI level in addition to some vocal complaints. Ninety-three percent of the student SLPs reported the presence of corporal pain during and/or after speaking. In particular, sore throat and headache were mentioned as the prevalent corporal pain symptoms. A longitudinal study of the objective vocal quality of the same subjects during their career as an SLP might provide new insights. 2010 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
[Cochlear implant in children: rational, indications and cost/efficacy].
Martini, A; Bovo, R; Trevisi, P; Forli, F; Berrettini, S
2013-06-01
A cochlear implant (CI) is a partially implanted electronic device that can help to provide a sense of sound and support speech to severely to profoundly hearing impaired patients. It is constituted by an external portion, that usually sits behind the ear and an internal portion surgically placed under the skin. The external components include a microphone connected to a speech processor that selects and arranges sounds pucked up by the microphone. This is connected to a transmitter coil, worn on the side of the head, which transmits data to an internal receiver coil placed under the skin. The received data are delivered to an array of electrodes that are surgically implanted within the cochlea. The primary neural targets of the electrodes are the spiral ganglion cells which innervate fibers of the auditory nerve. When the electrodes are activated by the signal, they send a current along the auditory nerve and auditory pathways to the auditory cortex. Children and adults who are profoundly or severely hearing impaired can be fitted with cochlear implants. According to the Food and Drug Administration, approximately 188,000 people worldwide have received implants. In Italy it is extimated that there are about 6-7000 implanted patients, with an average of 700 CI surgeries per year. Cochlear implantation, followed by intensive postimplantation speech therapy, can help young children to acquire speech, language, and social skills. Early implantation provides exposure to sounds that can be helpful during the critical period when children learn speech and language skills. In 2000, the Food and Drug Administration lowered the age of eligibility to 12 months for one type of CI. With regard to the results after cochlear implantation in relation to early implantation, better linguistic results are reported in children implanted before 12 months of life, even if no sufficient data exist regarding the relation between this advantage and the duration of implant use and how long this advantage persists in the subsequent years. With regard to cochlear implantation in children older than 12 months the studies show better hearing and linguistic results in children implanted at earlier ages. A sensitive period under 24-36 months has been identified over which cochlear implantation is reported to be less effective in terms of improvement in speech and hearing results. With regard to clinical effectiveness of bilateral cochlear implantation, greater benefits from bilateral implants compared to monolateral ones when assessing hearing in quiet and in noise and in sound localization abilities are reported to be present in both case of simultaneous or sequential bilateral implantation. However, with regard to the delay between the surgeries in sequential bilateral implantation, although benefit is reported to be present even after very long delays, on average long delays between surgeries seems to negatively affect the outcome with the second implant. With regard to benefits after cochlear implantation in children with multiple disabilities, benefits in terms of speech perception and communication as well as in quality of the daily life are reported even if benefits are slower and lower in comparison to those generally attained by implanted children without additional disabilities. Regarding the costs/efficacy ratio, the CI is expensive, in particular because of the cost of the high technological device, long life support, but even if healthcare costs are high, the savings in terms of indirect costs and quality of life are important. The CI, in fact, has a positive impact in terms of quality of life.
Altered Gesture and Speech Production in ASD Detract from In-Person Communicative Quality
ERIC Educational Resources Information Center
Morett, Laura M.; O'Hearn, Kirsten; Luna, Beatriz; Ghuman, Avniel Singh
2016-01-01
This study disentangled the influences of language and social processing on communication in autism spectrum disorder (ASD) by examining whether gesture and speech production differs as a function of social context. The results indicate that, unlike other adolescents, adolescents with ASD did not increase their coherency and engagement in the…
Captain's Log...The Speech Communication Oral Journal.
ERIC Educational Resources Information Center
Strong, William F.
1983-01-01
The logic and the benefits of requiring college students in basic speech communication classes to tape-record oral journals are set forth along with a detailed description of the assignment. Instructions to the students explain the mechanics of the assignment as follows: (1) obtain and properly label a quality cassette tape; (2) make seven…
Retracing Atypical Development: A Preserved Speech Variant of Rett Syndrome
ERIC Educational Resources Information Center
Marschik, Peter B.; Einspieler, Christa; Oberle, Andreas; Laccone, Franco; Prechtl, Heinz F. R.
2009-01-01
The subject of the present study is the development of a girl with the preserved speech variant of Rett disorder. Our data are based on detailed retrospective and prospective video analyses. Despite achieving developmental milestones, movement quality was already abnormal during the girl's first half year of life. In addition, early hand…
Pals, Carina; Sarampalis, Anastasios; van Dijk, Mart; Başkent, Deniz
2018-05-11
Residual acoustic hearing in electric-acoustic stimulation (EAS) can benefit cochlear implant (CI) users in increased sound quality, speech intelligibility, and improved tolerance to noise. The goal of this study was to investigate whether the low-pass-filtered acoustic speech in simulated EAS can provide the additional benefit of reducing listening effort for the spectrotemporally degraded signal of noise-band-vocoded speech. Listening effort was investigated using a dual-task paradigm as a behavioral measure, and the NASA Task Load indeX as a subjective self-report measure. The primary task of the dual-task paradigm was identification of sentences presented in three experiments at three fixed intelligibility levels: at near-ceiling, 50%, and 79% intelligibility, achieved by manipulating the presence and level of speech-shaped noise in the background. Listening effort for the primary intelligibility task was reflected in the performance on the secondary, visual response time task. Experimental speech processing conditions included monaural or binaural vocoder, with added low-pass-filtered speech (to simulate EAS) or without (to simulate CI). In Experiment 1, in quiet with intelligibility near-ceiling, additional low-pass-filtered speech reduced listening effort compared with binaural vocoder, in line with our expectations, although not compared with monaural vocoder. In Experiments 2 and 3, for speech in noise, added low-pass-filtered speech allowed the desired intelligibility levels to be reached at less favorable speech-to-noise ratios, as expected. It is interesting that this came without the cost of increased listening effort usually associated with poor speech-to-noise ratios; at 50% intelligibility, even a reduction in listening effort on top of the increased tolerance to noise was observed. The NASA Task Load indeX did not capture these differences. The dual-task results provide partial evidence for a potential decrease in listening effort as a result of adding low-frequency acoustic speech to noise-band-vocoded speech. Whether these findings translate to CI users with residual acoustic hearing will need to be addressed in future research because the quality and frequency range of low-frequency acoustic sound available to listeners with hearing loss may differ from our idealized simulations, and additional factors, such as advanced age and varying etiology, may also play a role.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
Low-income fathers’ speech to toddlers during book reading versus toy play*
Salo, Virginia C.; Rowe, Meredith L.; Leech, Kathryn A.; Cabrera, Natasha J.
2016-01-01
Fathers’ child-directed speech across two contexts was examined. Father–child dyads from sixty-nine low-income families were videotaped interacting during book reading and toy play when children were 2;0. Fathers used more diverse vocabulary and asked more questions during book reading while their mean length of utterance was longer during toy play. Variation in these specific characteristics of fathers’ speech that differed across contexts was also positively associated with child vocabulary skill measured on the MacArthur-Bates Communicative Development Inventory. Results are discussed in terms of how different contexts elicit specific qualities of child-directed speech that may promote language use and development. PMID:26541647
An optimization method for speech enhancement based on deep neural network
NASA Astrophysics Data System (ADS)
Sun, Haixia; Li, Sikun
2017-06-01
Now, this document puts forward a deep neural network (DNN) model with more credible data set and more robust structure. First, we take two regularization skills, dropout and sparsity constraint to strengthen the generalization ability of the model. In this way, not only the model is able to reach the consistency between the pre-training model and the fine-tuning model, but also it reduce resource consumption. Then network compression by weights sharing and quantization is allowed to reduce storage cost. In the end, we evaluate the quality of the reconstructed speech according to different criterion. The result proofs that the improved framework has good performance on speech enhancement and meets the requirement of speech processing.
High-frequency neural activity predicts word parsing in ambiguous speech streams.
Kösem, Anne; Basirat, Anahita; Azizi, Leila; van Wassenhove, Virginie
2016-12-01
During speech listening, the brain parses a continuous acoustic stream of information into computational units (e.g., syllables or words) necessary for speech comprehension. Recent neuroscientific hypotheses have proposed that neural oscillations contribute to speech parsing, but whether they do so on the basis of acoustic cues (bottom-up acoustic parsing) or as a function of available linguistic representations (top-down linguistic parsing) is unknown. In this magnetoencephalography study, we contrasted acoustic and linguistic parsing using bistable speech sequences. While listening to the speech sequences, participants were asked to maintain one of the two possible speech percepts through volitional control. We predicted that the tracking of speech dynamics by neural oscillations would not only follow the acoustic properties but also shift in time according to the participant's conscious speech percept. Our results show that the latency of high-frequency activity (specifically, beta and gamma bands) varied as a function of the perceptual report. In contrast, the phase of low-frequency oscillations was not strongly affected by top-down control. Whereas changes in low-frequency neural oscillations were compatible with the encoding of prelexical segmentation cues, high-frequency activity specifically informed on an individual's conscious speech percept. Copyright © 2016 the American Physiological Society.
High-frequency neural activity predicts word parsing in ambiguous speech streams
Basirat, Anahita; Azizi, Leila; van Wassenhove, Virginie
2016-01-01
During speech listening, the brain parses a continuous acoustic stream of information into computational units (e.g., syllables or words) necessary for speech comprehension. Recent neuroscientific hypotheses have proposed that neural oscillations contribute to speech parsing, but whether they do so on the basis of acoustic cues (bottom-up acoustic parsing) or as a function of available linguistic representations (top-down linguistic parsing) is unknown. In this magnetoencephalography study, we contrasted acoustic and linguistic parsing using bistable speech sequences. While listening to the speech sequences, participants were asked to maintain one of the two possible speech percepts through volitional control. We predicted that the tracking of speech dynamics by neural oscillations would not only follow the acoustic properties but also shift in time according to the participant's conscious speech percept. Our results show that the latency of high-frequency activity (specifically, beta and gamma bands) varied as a function of the perceptual report. In contrast, the phase of low-frequency oscillations was not strongly affected by top-down control. Whereas changes in low-frequency neural oscillations were compatible with the encoding of prelexical segmentation cues, high-frequency activity specifically informed on an individual's conscious speech percept. PMID:27605528
STI: An objective measure for the performance of voice communication systems
NASA Astrophysics Data System (ADS)
Houtgast, T.; Steeneken, H. J. M.
1981-06-01
A measuring device was developed for determining the quality of speech communication systems. It comprises two parts, a signal source which replaces the talker, producing an artificial speech-like signal, and an analysis part which replaces the listener, by which the signal at the receiving end of the system under test is evaluated. Each single measurement results in an index (ranging from 0-100%) which indicates the effect of that communication system on speech intelligibility. The index is called STI (Speech Transmission Index). A careful design of the characteristics of the test signal and of the type of signal analysis makes the present approach widely applicable. It was verified experimentally that a given STI implies a given effect on speech intelligibility, irrespective of the nature of the actual disturbance (noise interference, band-pass limiting, peak clipping, etc.).
Shao, Yu; Chang, Chip-Hong
2007-08-01
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.
Everyday listeners' impressions of speech produced by individuals with adductor spasmodic dysphonia.
Nagle, Kathleen F; Eadie, Tanya L; Yorkston, Kathryn M
2015-01-01
Individuals with adductor spasmodic dysphonia (ADSD) have reported that unfamiliar communication partners appear to judge them as sneaky, nervous or not intelligent, apparently based on the quality of their speech; however, there is minimal research into the actual everyday perspective of listening to ADSD speech. The purpose of this study was to investigate the impressions of listeners hearing ADSD speech for the first time using a mixed-methods design. Everyday listeners were interviewed following sessions in which they made ratings of ADSD speech. A semi-structured interview approach was used and data were analyzed using thematic content analysis. Three major themes emerged: (1) everyday listeners make judgments about speakers with ADSD; (2) ADSD speech does not sound normal to everyday listeners; and (3) rating overall severity is difficult for everyday listeners. Participants described ADSD speech similarly to existing literature; however, some listeners inaccurately extrapolated speaker attributes based solely on speech samples. Listeners may draw erroneous conclusions about individuals with ADSD and these biases may affect the communicative success of these individuals. Results have implications for counseling individuals with ADSD, as well as the need for education and awareness about ADSD. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Thoonsaengngam, Rattapol; Tangsangiumvisai, Nisachon
This paper proposes an enhanced method for estimating the a priori Signal-to-Disturbance Ratio (SDR) to be employed in the Acoustic Echo and Noise Suppression (AENS) system for full-duplex hands-free communications. The proposed a priori SDR estimation technique is modified based upon the Two-Step Noise Reduction (TSNR) algorithm to suppress the background noise while preserving speech spectral components. In addition, a practical approach to determine accurately the Echo Spectrum Variance (ESV) is presented based upon the linear relationship assumption between the power spectrum of far-end speech and acoustic echo signals. The ESV estimation technique is then employed to alleviate the acoustic echo problem. The performance of the AENS system that employs these two proposed estimation techniques is evaluated through the Echo Attenuation (EA), Noise Attenuation (NA), and two speech distortion measures. Simulation results based upon real speech signals guarantee that our improved AENS system is able to mitigate efficiently the problem of acoustic echo and background noise, while preserving the speech quality and speech intelligibility.
Nanda, Aditi; Koli, Dheeraj; Sharma, Sunanda; Suryavanshi, Shalini; Verma, Mahesh
2015-01-01
Surgical resection of soft palate due to cancer affects the effective functioning of the velopharyngeal mechanism (speech and deglutition). With the loss of speech intelligibility, hyper resonance in voice and impaired function of swallowing (due to nasal regurgitation), there is a depreciation in the quality of life of such an individual. In a multidisciplinary setup, the role of a prosthodontist has been described to rehabilitate such patients by fabrication of speech aid prosthesis. The design and method of fabrication of the prosthesis are simple and easy to perform. The use of prosthesis, together with training (of speech) by a speech pathologist resulted in improvement in speech. Furthermore, an improvement in swallowing had been noted, resulting in an improved nutritional intake and general well-being of an individual. The take-home message is that in the treatment of oral cancer, feasible, and rapid rehabilitation should be endeavored in order to make the patient socially more acceptable. The onus lies on the prosthodontist to practise the same in a rapid manner before the moral of the patient becomes low due to the associated stigma of cancer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.
The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation maymore » decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.« less
Iwasaki, Satoshi; Usami, Shin-Ichi; Takahashi, Haruo; Kanda, Yukihiko; Tono, Tetsuya; Doi, Katsumi; Kumakawa, Kozo; Gyo, Kiyofumi; Naito, Yasushi; Kanzaki, Sho; Yamanaka, Noboru; Kaga, Kimitaka
2017-07-01
To report on the safety and efficacy of an investigational active middle ear implant (AMEI) in Japan, and to compare results to preoperative results with a hearing aid. Prospective study conducted in Japan in which 23 Japanese-speaking adults suffering from conductive or mixed hearing loss received a VIBRANT SOUNDBRIDGE with implantation at the round window. Postoperative thresholds, speech perception results (word recognition scores, speech reception thresholds, signal-to-noise ratio [SNR]), and quality of life questionnaires at 20 weeks were compared with preoperative results with all patients receiving the same, best available hearing aid (HA). Statistically significant improvements in postoperative AMEI-aided thresholds (1, 2, 4, and 8 kHz) and on the speech reception thresholds and word recognition scores tests, compared with preoperative HA-aided results, were observed. On the SNR, the subjects' mean values showed statistically significant improvement, with -5.7 dB SNR for the AMEI-aided mean and -2.1 dB SNR for the preoperative HA-assisted mean. The APHAB quality of life questionnaire also showed statistically significant improvement with the AMEI. Results with the AMEI applied to the round window exceeded those of the best available hearing aid in speech perception as well as quality of life questionnaires. There were minimal adverse events or changes to patients' residual hearing.
Brinca, Lilia; Batista, Ana Paula; Tavares, Ana Inês; Pinto, Patrícia N; Araújo, Lara
2015-11-01
The main objective of the present study was to investigate if the type of voice stimuli-sustained vowel, oral reading, and connected speech-results in good intrarater and interrater agreement/reliability. A short-term panel study was performed. Voice samples from 30 native European Portuguese speakers were used in the present study. The speech materials used were (1) the sustained vowel /a/, (2) oral reading of the European Portuguese version of "The Story of Arthur the Rat," and (3) connected speech. After an extensive training with textual and auditory anchors, the judges were asked to rate the severity of dysphonic voice stimuli using the phonation dimensions G, R, and B from the GRBAS scale. The voice samples were judged 6 months and 1 year after the training. Intrarater agreement and reliability were generally very good for all the phonation dimensions and voice stimuli. The highest interrater reliability was obtained using the oral reading stimulus, particularly for phonation dimensions grade (G) and breathiness (B). Roughness (R) was the voice quality that was the most difficult to evaluate, leading to interrater unreliability in all voice quality ratings. Extensive training using textual and auditory anchors and the use of anchors during the voice evaluations appear to be good methods for auditory-perceptual evaluation of dysphonic voices. The best results of interrater reliability were obtained when the oral reading stimulus was used. Breathiness appears to be a voice quality that is easier to evaluate than roughness. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Speech disorders in Israeli Arab children.
Jaber, L; Nahmani, A; Shohat, M
1997-10-01
The aim of this work was to study the frequency of speech disorders in Israeli Arab children and its association with parental consanguinity. A questionnaire was sent to the parents of 1,495 Arab children attending kindergarten and the first two grades of the seven primary schools in the town of Taibe. Eight-six percent (1,282 parents) responded. The answers to the questionnaire revealed that 25% of the children reportedly had a speech and language disorder. Of the children identified by their parents as having a speech disorder, 44 were selected randomly for examination by a speech specialist. The disorders noted in this subgroup included errors in articulation (48.0%), poor language (18%), poor voice quality (15.9%); stuttering (13.6%), and other problems (4.5%). Rates of affected children of consanguineous and non-consanguineous marriages were 31% and 22.4%, respectively (p < 0.01). We conclude that speech disorders are an important problem among Israeli Arab schoolchildren. More comprehensive programs are needed to facilitate diagnosis and treatment.
Ruble, Lisa; Birdwhistell, Jessie; Toland, Michael D; McGrew, John H
2011-01-01
The significant increase in the numbers of students with autism combined with the need for better trained teachers (National Research Council, 2001) call for research on the effectiveness of alternative methods, such as consultation, that have the potential to improve service delivery. Data from 2 randomized controlled single-blind trials indicate that an autism-specific consultation planning framework known as the collaborative model for promoting competence and success (COMPASS) is effective in increasing child Individual Education Programs (IEP) outcomes (Ruble, Dal-rymple, & McGrew, 2010; Ruble, McGrew, & Toland, 2011). In this study, we describe the verbal interactions, defined as speech acts and speech act exchanges that take place during COMPASS consultation, and examine the associations between speech exchanges and child outcomes. We applied the Psychosocial Processes Coding Scheme (Leaper, 1991) to code speech acts. Speech act exchanges were overwhelmingly affiliative, failed to show statistically significant relationships with child IEP outcomes and teacher adherence, but did correlate positively with IEP quality.
RUBLE, LISA; BIRDWHISTELL, JESSIE; TOLAND, MICHAEL D.; MCGREW, JOHN H.
2011-01-01
The significant increase in the numbers of students with autism combined with the need for better trained teachers (National Research Council, 2001) call for research on the effectiveness of alternative methods, such as consultation, that have the potential to improve service delivery. Data from 2 randomized controlled single-blind trials indicate that an autism-specific consultation planning framework known as the collaborative model for promoting competence and success (COMPASS) is effective in increasing child Individual Education Programs (IEP) outcomes (Ruble, Dal-rymple, & McGrew, 2010; Ruble, McGrew, & Toland, 2011). In this study, we describe the verbal interactions, defined as speech acts and speech act exchanges that take place during COMPASS consultation, and examine the associations between speech exchanges and child outcomes. We applied the Psychosocial Processes Coding Scheme (Leaper, 1991) to code speech acts. Speech act exchanges were overwhelmingly affiliative, failed to show statistically significant relationships with child IEP outcomes and teacher adherence, but did correlate positively with IEP quality. PMID:22639523
Baron, Christine; Holcombe, Molly; van der Stelt, Candace
2018-02-01
Group treatment is an integral part of speech-language pathology (SLP) practice. The majority of SLP literature concerns group treatment provided in outpatient settings. This article describes the goals, procedures, and benefits of providing quality SLP group therapy in the comprehensive inpatient rehabilitation (CIR) setting. Effective CIR groups must be designed with attention to type and severity of communication impairment, as well physical stamina of group members. Group leaders need to target individualized patient goals while creating a challenging, complex, and dynamic group context that supports participation by all group members. Direct patient-to-patient interaction is fostered as much as possible. Peer feedback supports goal acquisition by fellow group members. The rich, complex group context fosters improved insight, initiation, social connectedness, and generalization of communication skills. Group treatment provides a unique type of treatment not easily replicated with individual treatment. SLP group treatment in a CIR is an essential component of an intensive, high-quality program. Continued advocacy for group therapy provision and research into its efficacy and effectiveness are warranted. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Irregular Speech Rate Dissociates Auditory Cortical Entrainment, Evoked Responses, and Frontal Alpha
Kayser, Stephanie J.; Ince, Robin A.A.; Gross, Joachim
2015-01-01
The entrainment of slow rhythmic auditory cortical activity to the temporal regularities in speech is considered to be a central mechanism underlying auditory perception. Previous work has shown that entrainment is reduced when the quality of the acoustic input is degraded, but has also linked rhythmic activity at similar time scales to the encoding of temporal expectations. To understand these bottom-up and top-down contributions to rhythmic entrainment, we manipulated the temporal predictive structure of speech by parametrically altering the distribution of pauses between syllables or words, thereby rendering the local speech rate irregular while preserving intelligibility and the envelope fluctuations of the acoustic signal. Recording EEG activity in human participants, we found that this manipulation did not alter neural processes reflecting the encoding of individual sound transients, such as evoked potentials. However, the manipulation significantly reduced the fidelity of auditory delta (but not theta) band entrainment to the speech envelope. It also reduced left frontal alpha power and this alpha reduction was predictive of the reduced delta entrainment across participants. Our results show that rhythmic auditory entrainment in delta and theta bands reflect functionally distinct processes. Furthermore, they reveal that delta entrainment is under top-down control and likely reflects prefrontal processes that are sensitive to acoustical regularities rather than the bottom-up encoding of acoustic features. SIGNIFICANCE STATEMENT The entrainment of rhythmic auditory cortical activity to the speech envelope is considered to be critical for hearing. Previous work has proposed divergent views in which entrainment reflects either early evoked responses related to sound encoding or high-level processes related to expectation or cognitive selection. Using a manipulation of speech rate, we dissociated auditory entrainment at different time scales. Specifically, our results suggest that delta entrainment is controlled by frontal alpha mechanisms and thus support the notion that rhythmic auditory cortical entrainment is shaped by top-down mechanisms. PMID:26538641
Methodology and Resources of the Itinerant Speech and Hearing Teacher
ERIC Educational Resources Information Center
Carrion-Martinez, Jose J; de la Rosa, Antonio Luque
2013-01-01
Introduction: Having spent twenty years of business and professional development from the emergence of speech and hearing teacher traveling, it seems appropriate to reflect on the role he has been playing this figure in order to apprehend the things considered to improve the approach to adopt towards to promote the quality of its educational…
Readability Statistics of Patient Information Leaflets in a Speech and Language Therapy Department
ERIC Educational Resources Information Center
Pothier, Louise; Day, Rachael; Harris, Catherine; Pothier, David D.
2008-01-01
Background: Information leaflets are commonly used in Speech and Language Therapy Departments. Despite widespread use, they can be of variable quality. Aims: To revise current departmental leaflets using the National Health Service (NHS) Toolkit for Producing Patient Information and to test the effect that this has on the readability scores of the…
Integrating Speech-Language Pathology Services in Palliative End-of-Life Care
ERIC Educational Resources Information Center
Pollens, Robin D.
2012-01-01
Clinical speech-language pathologists (SLPs) may receive referrals to consult with teams serving patients who have a severe and/or terminal disease. Palliative care focuses on the prevention or relief of suffering to maximize quality of life for these patients and their families. This article describes how the role of the SLP in palliative care…
ERIC Educational Resources Information Center
Franco, Horacio; Bratt, Harry; Rossier, Romain; Rao Gadde, Venkata; Shriberg, Elizabeth; Abrash, Victor; Precoda, Kristin
2010-01-01
SRI International's EduSpeak[R] system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology. Automatic pronunciation scoring allows the computer to provide feedback on the overall quality of pronunciation and to point to…
ERIC Educational Resources Information Center
Nelson, Lori A.
2011-01-01
Speech-language pathology literature is limited in describing the clinical practicum process from the student perspective. Much of the supervision literature in this field focuses on quantitative research and/or the point of view of the supervisor. Understanding the student experience serves to enhance the quality of clinical supervision. Of…
ERIC Educational Resources Information Center
Ingham, Roger J.; Bothe, Anne K.; Wang, Yuedong; Purkhiser, Krystal; New, Anneliese
2012-01-01
Purpose: To relate changes in four variables previously defined as characteristic of normally fluent speech to changes in phonatory behavior during oral reading by persons who stutter (PWS) and normally fluent controls under multiple fluency-inducing (FI) conditions. Method: Twelve PWS and 12 controls each completed 4 ABA experiments. During A…
Neural entrainment to rhythmic speech in children with developmental dyslexia
Power, Alan J.; Mead, Natasha; Barnes, Lisa; Goswami, Usha
2013-01-01
A rhythmic paradigm based on repetition of the syllable “ba” was used to study auditory, visual, and audio-visual oscillatory entrainment to speech in children with and without dyslexia using EEG. Children pressed a button whenever they identified a delay in the isochronous stimulus delivery (500 ms; 2 Hz delta band rate). Response power, strength of entrainment and preferred phase of entrainment in the delta and theta frequency bands were compared between groups. The quality of stimulus representation was also measured using cross-correlation of the stimulus envelope with the neural response. The data showed a significant group difference in the preferred phase of entrainment in the delta band in response to the auditory and audio-visual stimulus streams. A different preferred phase has significant implications for the quality of speech information that is encoded neurally, as it implies enhanced neuronal processing (phase alignment) at less informative temporal points in the incoming signal. Consistent with this possibility, the cross-correlogram analysis revealed superior stimulus representation by the control children, who showed a trend for larger peak r-values and significantly later lags in peak r-values compared to participants with dyslexia. Significant relationships between both peak r-values and peak lags were found with behavioral measures of reading. The data indicate that the auditory temporal reference frame for speech processing is atypical in developmental dyslexia, with low frequency (delta) oscillations entraining to a different phase of the rhythmic syllabic input. This would affect the quality of encoding of speech, and could underlie the cognitive impairments in phonological representation that are the behavioral hallmark of this developmental disorder across languages. PMID:24376407
Potential interactions among linguistic, autonomic, and motor factors in speech.
Kleinow, Jennifer; Smith, Anne
2006-05-01
Though anecdotal reports link certain speech disorders to increases in autonomic arousal, few studies have described the relationship between arousal and speech processes. Additionally, it is unclear how increases in arousal may interact with other cognitive-linguistic processes to affect speech motor control. In this experiment we examine potential interactions between autonomic arousal, linguistic processing, and speech motor coordination in adults and children. Autonomic responses (heart rate, finger pulse volume, tonic skin conductance, and phasic skin conductance) were recorded simultaneously with upper and lower lip movements during speech. The lip aperture variability (LA variability index) across multiple repetitions of sentences that varied in length and syntactic complexity was calculated under low- and high-arousal conditions. High arousal conditions were elicited by performance of the Stroop color word task. Children had significantly higher lip aperture variability index values across all speaking tasks, indicating more variable speech motor coordination. Increases in syntactic complexity and utterance length were associated with increases in speech motor coordination variability in both speaker groups. There was a significant effect of Stroop task, which produced increases in autonomic arousal and increased speech motor variability in both adults and children. These results provide novel evidence that high arousal levels can influence speech motor control in both adults and children. (c) 2006 Wiley Periodicals, Inc.
Non-speech oral motor treatment for children with developmental speech sound disorders.
Lee, Alice S-Y; Gibbon, Fiona E
2015-03-25
Children with developmental speech sound disorders have difficulties in producing the speech sounds of their native language. These speech difficulties could be due to structural, sensory or neurophysiological causes (e.g. hearing impairment), but more often the cause of the problem is unknown. One treatment approach used by speech-language therapists/pathologists is non-speech oral motor treatment (NSOMT). NSOMTs are non-speech activities that aim to stimulate or improve speech production and treat specific speech errors. For example, using exercises such as smiling, pursing, blowing into horns, blowing bubbles, and lip massage to target lip mobility for the production of speech sounds involving the lips, such as /p/, /b/, and /m/. The efficacy of this treatment approach is controversial, and evidence regarding the efficacy of NSOMTs needs to be examined. To assess the efficacy of non-speech oral motor treatment (NSOMT) in treating children with developmental speech sound disorders who have speech errors. In April 2014 we searched the Cochrane Central Register of Controlled Trials (CENTRAL), Ovid MEDLINE (R) and Ovid MEDLINE In-Process & Other Non-Indexed Citations, EMBASE, Education Resources Information Center (ERIC), PsycINFO and 11 other databases. We also searched five trial and research registers, checked the reference lists of relevant titles identified by the search and contacted researchers to identify other possible published and unpublished studies. Randomised and quasi-randomised controlled trials that compared (1) NSOMT versus placebo or control; and (2) NSOMT as adjunctive treatment or speech intervention versus speech intervention alone, for children aged three to 16 years with developmental speech sound disorders, as judged by a speech and language therapist. Individuals with an intellectual disability (e.g. Down syndrome) or a physical disability were not excluded. The Trials Search Co-ordinator of the Cochrane Developmental, Psychosocial and Learning Problems Group and one review author ran the searches. Two review authors independently screened titles and abstracts to eliminate irrelevant studies, extracted data from the included studies and assessed risk of bias in each of these studies. In cases of ambiguity or information missing from the paper, we contacted trial authors. This review identified three studies (from four reports) involving a total of 22 children that investigated the efficacy of NSOMT as adjunctive treatment to conventional speech intervention versus conventional speech intervention for children with speech sound disorders. One study, a randomised controlled trial (RCT), included four boys aged seven years one month to nine years six months - all had speech sound disorders, and two had additional conditions (one was diagnosed as "communication impaired" and the other as "multiply disabled"). Of the two quasi-randomised controlled trials, one included 10 children (six boys and four girls), aged five years eight months to six years nine months, with speech sound disorders as a result of tongue thrust, and the other study included eight children (four boys and four girls), aged three to six years, with moderate to severe articulation disorder only. Two studies did not find NSOMT as adjunctive treatment to be more effective than conventional speech intervention alone, as both intervention and control groups made similar improvements in articulation after receiving treatments. One study reported a change in postintervention articulation test results but used an inappropriate statistical test and did not report the results clearly. None of the included studies examined the effects of NSOMTs on any other primary outcomes, such as speech intelligibility, speech physiology and adverse effects, or on any of the secondary outcomes such as listener acceptability.The RCT was judged at low risk for selection bias. The two quasi-randomised trials used randomisation but did not report the method for generating the random sequence and were judged as having unclear risk of selection bias. The three included studies were deemed to have high risk of performance bias as, given the nature of the intervention, blinding of participants was not possible. Only one study implemented blinding of outcome assessment and was at low risk for detection bias. One study showed high risk of other bias as the baseline characteristics of participants seemed to be unequal. The sample size of each of the included studies was very small, which means it is highly likely that participants in these studies were not representative of its target population. In the light of these serious limitations in methodology, the overall quality of the evidence provided by the included trials is judged to be low. Therefore, further research is very likely to have an important impact on our confidence in the estimate of treatment effect and is likely to change the estimate. The three included studies were small in scale and had a number of serious methodological limitations. In addition, they covered limited types of NSOMTs for treating children with speech sound disorders of unknown origin with the sounds /s/ and /z/. Hence, we judged the overall applicability of the evidence as limited and incomplete. Results of this review are consistent with those of previous reviews: Currently no strong evidence suggests that NSOMTs are an effective treatment or an effective adjunctive treatment for children with developmental speech sound disorders. Lack of strong evidence regarding the treatment efficacy of NSOMTs has implications for clinicians when they make decisions in relation to treatment plans. Well-designed research is needed to carefully investigate NSOMT as a type of treatment for children with speech sound disorders.
Advanced Persuasive Speaking, English, Speech: 5114.112.
ERIC Educational Resources Information Center
Dade County Public Schools, Miami, FL.
Developed as a high school quinmester unit on persuasive speaking, this guide provides the teacher with teaching strategies for a course which analyzes speeches from "Vital Speeches of the Day," political speeches, TV commercials, and other types of speeches. Practical use of persuasive methods for school, community, county, state, and…
Bruchhage, Karl-Ludwig; Leichtle, Anke; Schönweiler, Rainer; Todt, Ingo; Baumgartner, Wolf-Dieter; Frenzel, Henning; Wollenberg, Barbara
2017-04-01
Introduced in the late 90s, the active middle ear implant Vibrant Soundbridge (VSB) is nowadays used for hearing rehabilitation in patients with mild to severe sensorineural hearing loss (SNHL) unable to tolerate conventional hearing aids. In experienced hands, the surgical implantation is fast done, safe and highly standardized. Here, we present a systematic review, after more than 15 years of application, to determine the efficacy/effectiveness and cost-effectiveness, as well as patient satisfaction with the VSB active middle ear implant in the treatment of mild to severe SNHL. A systematic search of electronic databases, investigating the safety and effectiveness of the VSB in SNHL plus medical condition resulted in a total of 1640 papers. After removing duplicates, unrelated articles, screening against inclusion criteria and after in-depth screening, the number decreased to 37 articles. 13 articles were further excluded due to insufficient outcome data. 24 studies remained to be systematically reviewed. Data was searched on safety, efficacy and economical outcomes with the VSB. Safety-oriented outcomes included complication/adverse event rates, damage to the middle/inner ear, revision surgery/explant rate/device failure and mortality. Efficacy outcomes were divided into audiological outcomes, including hearing thresholds, functional gain, speech perception in quiet and noise, speech recognition thresholds, real ear insertion gain and subjective outcomes determined by questionnaires and patient-oriented scales. Data related to quality of life (QALY, ICER) were considered under economical outcomes. The VSB turns out to be a highly reliable and a safe device which significantly improves perception of speech in noisy situations with a high sound quality. In addition, the subjective benefit of the VSB was found to be mostly significant in all studies. Finally, implantation with the VSB proved to be a cost-effective and justified health care intervention.
Terband, H; Maassen, B; Guenther, F H; Brumberg, J
2014-01-01
Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between neurological deficits in auditory and motor processes using computational modeling with the DIVA model. In a series of computer simulations, we investigated the effect of a motor processing deficit alone (MPD), and the effect of a motor processing deficit in combination with an auditory processing deficit (MPD+APD) on the trajectory and endpoint of speech motor development in the DIVA model. Simulation results showed that a motor programming deficit predominantly leads to deterioration on the phonological level (phonemic mappings) when auditory self-monitoring is intact, and on the systemic level (systemic mapping) if auditory self-monitoring is impaired. These findings suggest a close relation between quality of auditory self-monitoring and the involvement of phonological vs. motor processes in children with pediatric motor speech disorders. It is suggested that MPD+APD might be involved in typically apraxic speech output disorders and MPD in pediatric motor speech disorders that also have a phonological component. Possibilities to verify these hypotheses using empirical data collected from human subjects are discussed. The reader will be able to: (1) identify the difficulties in studying disordered speech motor development; (2) describe the differences in speech motor characteristics between SSD and subtype CAS; (3) describe the different types of learning that occur in the sensory-motor system during babbling and early speech acquisition; (4) identify the neural control subsystems involved in speech production; (5) describe the potential role of auditory self-monitoring in developmental speech disorders. Copyright © 2014 Elsevier Inc. All rights reserved.
Hearing Feelings: Affective Categorization of Music and Speech in Alexithymia, an ERP Study
Goerlich, Katharina Sophia; Witteman, Jurriaan; Aleman, André; Martens, Sander
2011-01-01
Background Alexithymia, a condition characterized by deficits in interpreting and regulating feelings, is a risk factor for a variety of psychiatric conditions. Little is known about how alexithymia influences the processing of emotions in music and speech. Appreciation of such emotional qualities in auditory material is fundamental to human experience and has profound consequences for functioning in daily life. We investigated the neural signature of such emotional processing in alexithymia by means of event-related potentials. Methodology Affective music and speech prosody were presented as targets following affectively congruent or incongruent visual word primes in two conditions. In two further conditions, affective music and speech prosody served as primes and visually presented words with affective connotations were presented as targets. Thirty-two participants (16 male) judged the affective valence of the targets. We tested the influence of alexithymia on cross-modal affective priming and on N400 amplitudes, indicative of individual sensitivity to an affective mismatch between words, prosody, and music. Our results indicate that the affective priming effect for prosody targets tended to be reduced with increasing scores on alexithymia, while no behavioral differences were observed for music and word targets. At the electrophysiological level, alexithymia was associated with significantly smaller N400 amplitudes in response to affectively incongruent music and speech targets, but not to incongruent word targets. Conclusions Our results suggest a reduced sensitivity for the emotional qualities of speech and music in alexithymia during affective categorization. This deficit becomes evident primarily in situations in which a verbalization of emotional information is required. PMID:21573026
Correlational Analysis of Speech Intelligibility Tests and Metrics for Speech Transmission
2017-12-04
frequency scale (male voice; normal voice effort) ............................... 4 Fig. 2 Diagram of a speech communication system (Letowski...languages. Consonants contain mostly high frequency (above 1500 Hz) speech energy, but this energy is relatively small in comparison to that of the whole...voices (Letowski et al. 1993). Since the mid- frequency spectral region contains mostly vowel energy while consonants are high frequency sounds, an
Brozovich, Faith A; Heimberg, Richard G
2013-12-01
The present study investigated whether post-event processing (PEP) involving mental imagery about a past speech is particularly detrimental for socially anxious individuals who are currently anticipating giving a speech. One hundred fourteen high and low socially anxious participants were told they would give a 5 min impromptu speech at the end of the experimental session. They were randomly assigned to one of three manipulation conditions: post-event processing about a past speech incorporating imagery (PEP-Imagery), semantic post-event processing about a past speech (PEP-Semantic), or a control condition, (n=19 per experimental group, per condition [high vs low socially anxious]). After the condition inductions, individuals' anxiety, their predictions of performance in the anticipated speech, and their interpretations of other ambiguous social events were measured. Consistent with predictions, high socially anxious individuals in the PEP-Imagery condition displayed greater anxiety than individuals in the other conditions immediately following the induction and before the anticipated speech task. They also interpreted ambiguous social scenarios in a more socially anxious manner than socially anxious individuals in the control condition. High socially anxious individuals made more negative predictions about their upcoming speech performance than low anxious participants in all conditions. The impact of imagery during post-event processing in social anxiety and its implications are discussed. © 2013.
Sackley, Catherine M; Smith, Christina H; Rick, Caroline E; Brady, Marian C; Ives, Natalie; Patel, Smitaa; Woolley, Rebecca; Dowling, Francis; Patel, Ramilla; Roberts, Helen; Jowett, Sue; Wheatley, Keith; Kelly, Debbie; Sands, Gina; Clarke, Carl E
2018-01-01
Speech-related problems are common in Parkinson's disease (PD), but there is little evidence for the effectiveness of standard speech and language therapy (SLT) or Lee Silverman Voice Treatment (LSVT LOUD®). The PD COMM pilot was a three-arm, assessor-blinded, randomised controlled trial (RCT) of LSVT LOUD®, SLT and no intervention (1:1:1 ratio) to assess the feasibility and to inform the design of a full-scale RCT. Non-demented patients with idiopathic PD and speech problems and no SLT for speech problems in the past 2 years were eligible. LSVT LOUD® is a standardised regime (16 sessions over 4 weeks). SLT comprised individualised content per local practice (typically weekly sessions for 6-8 weeks). Outcomes included recruitment and retention, treatment adherence, and data completeness. Outcome data collected at baseline, 3, 6, and 12 months included patient-reported voice and quality of life measures, resource use, and assessor-rated speech recordings. Eighty-nine patients were randomised with 90% in the therapy groups and 100% in the control group completing the trial. The response rate for Voice Handicap Index (VHI) in each arm was ≥ 90% at all time-points. VHI was highly correlated with the other speech-related outcome measures. There was a trend to improvement in VHI with LSVT LOUD® (difference at 3 months compared with control: - 12.5 points; 95% CI - 26.2, 1.2) and SLT (difference at 3 months compared with control: - 9.8 points; 95% CI - 23.2, 3.7) which needs to be confirmed in an adequately powered trial. Randomisation to a three-arm trial of speech therapy including a no intervention control is feasible and acceptable. Compliance with both interventions was good. VHI and other patient-reported outcomes were relevant measures and provided data to inform the sample size for a substantive trial. International Standard Randomised Controlled Trial Number Register: ISRCTN75223808. registered 22 March 2012.
FEENAUGHTY, LYNDA; TJADEN, KRIS; BENEDICT, RALPH H.B.; WEINSTOCK-GUTTMAN, BIANCA
2017-01-01
This preliminary study investigated how cognitive-linguistic status in multiple sclerosis (MS) is reflected in two speech tasks (i.e. oral reading, narrative) that differ in cognitive-linguistic demand. Twenty individuals with MS were selected to comprise High and Low performance groups based on clinical tests of executive function and information processing speed and efficiency. Ten healthy controls were included for comparison. Speech samples were audio-recorded and measures of global speech timing were obtained. Results indicated predicted differences in global speech timing (i.e. speech rate and pause characteristics) for speech tasks differing in cognitive-linguistic demand, but the magnitude of these task-related differences was similar for all speaker groups. Findings suggest that assumptions concerning the cognitive-linguistic demands of reading aloud as compared to spontaneous speech may need to be re-considered for individuals with cognitive impairment. Qualitative trends suggest that additional studies investigating the association between cognitive-linguistic and speech motor variables in MS are warranted. PMID:23294227
Rubin, Adam D; Jackson-Menaldi, Cristina; Kopf, Lisa M; Marks, Katherine; Skeffington, Jean; Skowronski, Mark D; Shrivastav, Rahul; Hunter, Eric J
2018-05-14
The diagnoses of voice disorders, as well as treatment outcomes, are often tracked using visual (eg, stroboscopic images), auditory (eg, perceptual ratings), objective (eg, from acoustic or aerodynamic signals), and patient report (eg, Voice Handicap Index and Voice-Related Quality of Life) measures. However, many of these measures are known to have low to moderate sensitivity and specificity for detecting changes in vocal characteristics, including vocal quality. The objective of this study was to compare changes in estimated pitch strength (PS) with other conventionally used acoustic measures based on the cepstral peak prominence (smoothed cepstral peak prominence, cepstral spectral index of dysphonia, and acoustic voice quality index), and clinical judgments of voice quality (GRBAS [grade, roughness, breathiness, asthenia, strain] scale) following laryngeal framework surgery. This study involved post hoc analysis of recordings from 22 patients pretreatment and post treatment (thyroplasty and behavioral therapy). Sustained vowels and connected speech were analyzed using objective measures (PS, smoothed cepstral peak prominence, cepstral spectral index of dysphonia, and acoustic voice quality index), and these results were compared with mean auditory-perceptual ratings by expert clinicians using the GRBAS scale. All four acoustic measures changed significantly in the direction that usually indicates improved voice quality following treatment (P < 0.005). Grade and breathiness correlated the strongest with the acoustic measures (|r| ~0.7) with strain being the least correlated. Acoustic analysis on running speech highly correlates with judged ratings. PS is a robust, easily obtained acoustic measure of voice quality that could be useful in the clinical environment to follow treatment of voice disorders. Copyright © 2018. Published by Elsevier Inc.
Ingham, Roger J.; Bothe, Anne K.; Wang, Yuedong; Purkhiser, Krystal; New, Anneliese
2012-01-01
Purpose To relate changes in four variables previously defined as characteristic of normally fluent speech to changes in phonatory behavior during oral reading by persons who stutter (PWS) and normally fluent controls under multiple fluency-inducing (FI) conditions. Method Twelve PWS and 12 controls each completed 4 ABA experiments. During A phases, participants read normally. B phases were 4 different FI conditions: auditory masking, chorus reading, whispering, and rhythmic stimulation. Dependent variables were the durations of accelerometer-recorded phonated intervals; self-judged speech effort; and observer-judged stuttering frequency, speech rate, and speech naturalness. The method enabled a systematic replication of Ingham et al. (2009). Results All FI conditions resulted in decreased stuttering and decreases in the number of short phonated intervals, as compared with baseline conditions, but the only FI condition that satisfied all four characteristics of normally fluent speech was chorus reading. Increases in longer phonated intervals were associated with decreased stuttering but also with poorer naturalness and/or increased speech effort. Previous findings concerning the effects of FI conditions on speech naturalness and effort were replicated. Conclusions Measuring all relevant characteristics of normally fluent speech, in the context of treatments that aim to reduce the occurrence of short-duration PIs, may aid the search for an explanation of the nature of stuttering and may also maximize treatment outcomes for adults who stutter. PMID:22365886
Terband, H.; Maassen, B.; Guenther, F.H.; Brumberg, J.
2014-01-01
Background/Purpose Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between neurological deficits in auditory and motor processes using computational modeling with the DIVA model. Method In a series of computer simulations, we investigated the effect of a motor processing deficit alone (MPD), and the effect of a motor processing deficit in combination with an auditory processing deficit (MPD+APD) on the trajectory and endpoint of speech motor development in the DIVA model. Results Simulation results showed that a motor programming deficit predominantly leads to deterioration on the phonological level (phonemic mappings) when auditory self-monitoring is intact, and on the systemic level (systemic mapping) if auditory self-monitoring is impaired. Conclusions These findings suggest a close relation between quality of auditory self-monitoring and the involvement of phonological vs. motor processes in children with pediatric motor speech disorders. It is suggested that MPD+APD might be involved in typically apraxic speech output disorders and MPD in pediatric motor speech disorders that also have a phonological component. Possibilities to verify these hypotheses using empirical data collected from human subjects are discussed. PMID:24491630
Lamprecht-Dinnesen, A; Sick, U; Sandrieser, P; Illg, A; Lesinski-Schiedat, A; Döring, W H; Müller-Deile, J; Kiefer, J; Matthias, K; Wüst, A; Konradi, E; Riebandt, M; Matulat, P; Von Der Haar-Heise, S; Swart, J; Elixmann, K; Neumann, K; Hildmann, A; Coninx, F; Meyer, V; Gross, M; Kruse, E; Lenarz, T
2002-10-01
Since autumn 1998 the multicenter interdisciplinary study group "Test Materials for CI Children" has been compiling a uniform examination tool for evaluation of speech and hearing development after cochlear implantation in childhood. After studying the relevant literature, suitable materials were checked for practical applicability, modified and provided with criteria for execution and break-off. For data acquisition, observation forms for preparation of a PC-version were developed. The evaluation set contains forms for master data with supplements relating to postoperative processes. The hearing tests check supra-threshold hearing with loudness scaling for children, speech comprehension in silence (Mainz and Göttingen Test for Speech Comprehension in Childhood) and phonemic differentiation (Oldenburg Rhyme Test for Children), the central auditory processes of detection, discrimination, identification and recognition (modification of the "Frankfurt Functional Hearing Test for Children") and audiovisual speech perception (Open Paragraph Tracking, Kiel Speech Track Program). The materials for speech and language development comprise phonetics-phonology, lexicon and semantics (LOGO Pronunciation Test), syntax and morphology (analysis of spontaneous speech), language comprehension (Reynell Scales), communication and pragmatics (observation forms). The MAIS and MUSS modified questionnaires are integrated. The evaluation set serves quality assurance and permits factor analysis as well as controls for regularity through the multicenter comparison of long-term developmental trends after cochlear implantation.
Defazio, Giovanni; Guerrieri, Marta; Liuzzi, Daniele; Gigante, Angelo Fabio; di Nicola, Vincenzo
2016-03-01
Changes in voice and speech are thought to involve 75-90% of people with PD, but the impact of PD progression on voice/speech parameters is not well defined. In this study, we assessed voice/speech symptoms in 48 parkinsonian patients staging <3 on the modified Hoehn and Yahr scale and 37 healthy subjects using the Robertson dysarthria profile (a clinical-perceptual method exploring all components potentially involved in speech difficulties), the Voice handicap index (a validated measure of the impact of voice symptoms on quality of life) and the speech evaluation parameter contained in the Unified Parkinson's Disease Rating Scale part III (UPDRS-III). Accuracy and metric properties of the Robertson dysarthria profile were also measured. On Robertson dysarthria profile, all parkinsonian patients yielded lower scores than healthy control subjects. Differently, the Voice Handicap Index and the speech evaluation parameter contained in the UPDRS-III could detect speech/voice disturbances in 10 and 75% of PD patients, respectively. Validation procedure in Parkinson's disease patients showed that the Robertson dysarthria profile has acceptable reliability, satisfactory internal consistency and scaling assumptions, lack of floor and ceiling effects, and partial correlations with UPDRS-III and Voice Handicap Index. We concluded that speech/voice disturbances are widely identified by the Robertson dysarthria profile in early parkinsonian patients, even when the disturbances do not carry a significant level of disability. Robertson dysarthria profile may be a valuable tool to detect speech/voice disturbances in Parkinson's disease.
Humes, Larry E.; Kidd, Gary R.; Lentz, Jennifer J.
2013-01-01
This study was designed to address individual differences in aided speech understanding among a relatively large group of older adults. The group of older adults consisted of 98 adults (50 female and 48 male) ranging in age from 60 to 86 (mean = 69.2). Hearing loss was typical for this age group and about 90% had not worn hearing aids. All subjects completed a battery of tests, including cognitive (6 measures), psychophysical (17 measures), and speech-understanding (9 measures), as well as the Speech, Spatial, and Qualities of Hearing (SSQ) self-report scale. Most of the speech-understanding measures made use of competing speech and the non-speech psychophysical measures were designed to tap phenomena thought to be relevant for the perception of speech in competing speech (e.g., stream segregation, modulation-detection interference). All measures of speech understanding were administered with spectral shaping applied to the speech stimuli to fully restore audibility through at least 4000 Hz. The measures used were demonstrated to be reliable in older adults and, when compared to a reference group of 28 young normal-hearing adults, age-group differences were observed on many of the measures. Principal-components factor analysis was applied successfully to reduce the number of independent and dependent (speech understanding) measures for a multiple-regression analysis. Doing so yielded one global cognitive-processing factor and five non-speech psychoacoustic factors (hearing loss, dichotic signal detection, multi-burst masking, stream segregation, and modulation detection) as potential predictors. To this set of six potential predictor variables were added subject age, Environmental Sound Identification (ESI), and performance on the text-recognition-threshold (TRT) task (a visual analog of interrupted speech recognition). These variables were used to successfully predict one global aided speech-understanding factor, accounting for about 60% of the variance. PMID:24098273
Cox, Robyn M; Alexander, Genevieve C; Johnson, Jani; Rivera, Izel
2011-01-01
We investigated the prevalence of cochlear dead regions in listeners with hearing losses similar to those of many hearing aid wearers, and explored the impact of these dead regions on speech perception. Prevalence of dead regions was assessed using the Threshold Equalizing Noise test (TEN(HL)). Speech recognition was measured using high-frequency emphasis (HFE) Quick Speech In Noise (QSIN) test stimuli and low-pass filtered HFE QSIN stimuli. About one third of subjects tested positive for a dead region at one or more frequencies. Also, groups without and with dead regions both benefited from additional high-frequency speech cues. PMID:21522068
ERIC Educational Resources Information Center
Most, Tova; Levin, Iris; Sarsour, Marwa
2008-01-01
This article examined the effect of Modern Standard Arabic orthography on speech production quality (syllable stress and vowels) by 23 Arabic-speaking children with severe or profound hearing loss aged 8-12 years. Children produced 15 one-syllable minimal pairs of words that differed in vowel length (short vs. long) and 20 two-syllable minimal…
ERIC Educational Resources Information Center
Richardson, Tanya; Murray, Jane
2017-01-01
Within English early childhood education, there is emphasis on improving speech and language development as well as a drive for outdoor learning. This paper synthesises both aspects to consider whether or not links exist between the environment and the quality of young children's utterances as part of their speech and language development and if…
The needs of aphasic patients for verbal communication as the element of life quality.
Kulik, Teresa Bernadetta; Koc-Kozłowiec, Barbara; Wrońska, Irena; Rudnicka-Drozak, Ewa
2003-01-01
The fact of using the language by man confirms the specific properties of his brain. Man is not able to learn this skill without a contact with speaking and human environment. This skill of linguistic communication with others allows man to get knowledge about the surrounding world and on the other hand it enables him to express his thoughts, feelings and needs. Therefore, people with serious speech disorders, i.e. aphasic patients, suffer not only from the problems connected with communication but mainly because of the deterioration of their social status that consequently will change their life quality. Generally, they cannot cope with the tasks they are lacking both in their personal and professional life. Speech is defined as the process of communication; the act in which the transmitter sends verbal structured message (statement), and the receiver perceives this message or understands its contents. The present paper presents the realised programme of 8-week speech re-education of 10 patients with motor aphasia and 10 patients with sensory aphasia. The examination of speech was performed on the basis of clinical-experimental tests developed by A. Luria. Diagnostic treatment in this test is focused on the qualitative analysis of the disorders structure.
A new method to sample stuttering in preschool children.
O'Brian, Sue; Jones, Mark; Pilowsky, Rachel; Onslow, Mark; Packman, Ann; Menzies, Ross
2010-06-01
This study reports a new method for sampling the speech of preschool stuttering children outside the clinic environment. Twenty parents engaged their stuttering children in an everyday play activity in the home with a telephone handset nearby. A remotely located researcher telephoned the parent and recorded the play session with a phone-recording jack attached to a digital audio recorder at the remote location. The parent placed an audio recorder near the child for comparison purposes. Children as young as 2 years complied with the remote method of speech sampling. The quality of the remote recordings was superior to that of the in-home recordings. There was no difference in means or reliability of stutter-count measures made from the remote recordings compared with those made in-home. Advantages of the new method include: (1) cost efficiency of real-time measurement of percent syllables stuttered in naturalistic situations, (2) reduction of bias associated with parent-selected timing of home recordings, (3) standardization of speech sampling procedures, (4) improved parent compliance with sampling procedures, (5) clinician or researcher on-line control of the acoustic and linguistic quality of recordings, and (6) elimination of the need to lend equipment to parents for speech sampling.
NASA Technical Reports Server (NTRS)
1991-01-01
More than 750 NASA, government, contractor, and academic representatives attended the Seventh Annual NASA/Contractors Conference on Quality and Productivity. The panel presentations and Keynote speeches revolving around the theme of total quality leadership provided a solid base of understanding of the importance, benefits, and principles of total quality management (TQM). The presentations from the conference are summarized.
Duration, Pitch, and Loudness in Kunqu Opera Stage Speech.
Han, Qichao; Sundberg, Johan
2017-03-01
Kunqu is a special type of opera within the Chinese tradition with 600 years of history. In it, stage speech is used for the spoken dialogue. It is performed in Ming Dynasty's mandarin language and is a much more dominant part of the play than singing. Stage speech deviates considerably from normal conversational speech with respect to duration, loudness and pitch. This paper compares these properties in stage speech conversational speech. A famous, highly experienced female singer's performed stage speech and reading of the same lyrics in a conversational speech mode. Clear differences are found. As compared with conversational speech, stage speech had longer word and sentence duration and word duration was less variable. Average sound level was 16 dB higher. Also mean fundamental frequency was considerably higher and more varied. Within sentences, both loudness and fundamental frequency tended to vary according to a low-high-low pattern. Some of the findings fail to support current opinions regarding the characteristics of stage speech, and in this sense the study demonstrates the relevance of objective measurements in descriptions of vocal styles. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
An Intrinsically Digital Amplification Scheme for Hearing Aids
NASA Astrophysics Data System (ADS)
Blamey, Peter J.; Macfarlane, David S.; Steele, Brenton R.
2005-12-01
Results for linear and wide-dynamic range compression were compared with a new 64-channel digital amplification strategy in three separate studies. The new strategy addresses the requirements of the hearing aid user with efficient computations on an open-platform digital signal processor (DSP). The new amplification strategy is not modeled on prior analog strategies like compression and linear amplification, but uses statistical analysis of the signal to optimize the output dynamic range in each frequency band independently. Using the open-platform DSP processor also provided the opportunity for blind trial comparisons of the different processing schemes in BTE and ITE devices of a high commercial standard. The speech perception scores and questionnaire results show that it is possible to provide improved audibility for sound in many narrow frequency bands while simultaneously improving comfort, speech intelligibility in noise, and sound quality.
The normalities and abnormalities associated with speech in psychometrically-defined schizotypy.
Cohen, Alex S; Auster, Tracey L; McGovern, Jessica E; MacAulay, Rebecca K
2014-12-01
Speech deficits are thought to be an important feature of schizotypy--defined as the personality organization reflecting a putative liability for schizophrenia. There is reason to suspect that these deficits manifest as a function of limited cognitive resources. To evaluate this idea, we examined speech from individuals with psychometrically-defined schizotypy during a low cognitively-demanding task versus a relatively high cognitively-demanding task. A range of objective, computer-based measures of speech tapping speech production (silence, number and length of pauses, number and length of utterances), speech variability (global and local intonation and emphasis) and speech content (word fillers, idea density) were employed. Data for control (n=37) and schizotypy (n=39) groups were examined. Results did not confirm our hypotheses. While the cognitive-load task reduced speech expressivity for subjects as a group for most variables, the schizotypy group was not more pathological in speech characteristics compared to the control group. Interestingly, some aspects of speech in schizotypal versus control subjects were healthier under high cognitive load. Moreover, schizotypal subjects performed better, at a trend level, than controls on the cognitively demanding task. These findings hold important implications for our understanding of the neurocognitive architecture associated with the schizophrenia-spectrum. Of particular note concerns the apparent mismatch between self-reported schizotypal traits and objective performance, and the resiliency of speech under cognitive stress in persons with high levels of schizotypy. Copyright © 2014 Elsevier B.V. All rights reserved.
Transferring of speech movements from video to 3D face space.
Pei, Yuru; Zha, Hongbin
2007-01-01
We present a novel method for transferring speech animation recorded in low quality videos to high resolution 3D face models. The basic idea is to synthesize the animated faces by an interpolation based on a small set of 3D key face shapes which span a 3D face space. The 3D key shapes are extracted by an unsupervised learning process in 2D video space to form a set of 2D visemes which are then mapped to the 3D face space. The learning process consists of two main phases: 1) Isomap-based nonlinear dimensionality reduction to embed the video speech movements into a low-dimensional manifold and 2) K-means clustering in the low-dimensional space to extract 2D key viseme frames. Our main contribution is that we use the Isomap-based learning method to extract intrinsic geometry of the speech video space and thus to make it possible to define the 3D key viseme shapes. To do so, we need only to capture a limited number of 3D key face models by using a general 3D scanner. Moreover, we also develop a skull movement recovery method based on simple anatomical structures to enhance 3D realism in local mouth movements. Experimental results show that our method can achieve realistic 3D animation effects with a small number of 3D key face models.
Shao, Xu; Milner, Ben
2005-08-01
This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs) as may be encountered in a distributed speech recognition (DSR) system. Previous methods for speech reconstruction have required, in addition to the MFCC vectors, fundamental frequency and voicing components. In this work the voicing classification and fundamental frequency are predicted from the MFCC vectors themselves using two maximum a posteriori (MAP) methods. The first method enables fundamental frequency prediction by modeling the joint density of MFCCs and fundamental frequency using a single Gaussian mixture model (GMM). The second scheme uses a set of hidden Markov models (HMMs) to link together a set of state-dependent GMMs, which enables a more localized modeling of the joint density of MFCCs and fundamental frequency. Experimental results on speaker-independent male and female speech show that accurate voicing classification and fundamental frequency prediction is attained when compared to hand-corrected reference fundamental frequency measurements. The use of the predicted fundamental frequency and voicing for speech reconstruction is shown to give very similar speech quality to that obtained using the reference fundamental frequency and voicing.
Speech enhancement using the modified phase-opponency model.
Deshmukh, Om D; Espy-Wilson, Carol Y; Carney, Laurel H
2007-06-01
In this paper we present a model called the Modified Phase-Opponency (MPO) model for single-channel speech enhancement when the speech is corrupted by additive noise. The MPO model is based on the auditory PO model, proposed for detection of tones in noise. The PO model includes a physiologically realistic mechanism for processing the information in neural discharge times and exploits the frequency-dependent phase properties of the tuned filters in the auditory periphery by using a cross-auditory-nerve-fiber coincidence detection for extracting temporal cues. The MPO model alters the components of the PO model such that the basic functionality of the PO model is maintained but the properties of the model can be analyzed and modified independently. The MPO-based speech enhancement scheme does not need to estimate the noise characteristics nor does it assume that the noise satisfies any statistical model. The MPO technique leads to the lowest value of the LPC-based objective measures and the highest value of the perceptual evaluation of speech quality measure compared to other methods when the speech signals are corrupted by fluctuating noise. Combining the MPO speech enhancement technique with our aperiodicity, periodicity, and pitch detector further improves its performance.
Simulated learning environments in speech-language pathology: an Australian response.
MacBean, Naomi; Theodoros, Deborah; Davidson, Bronwyn; Hill, Anne E
2013-06-01
The rising demand for health professionals to service the Australian population is placing pressure on traditional approaches to clinical education in the allied health professions. Existing research suggests that simulated learning environments (SLEs) have the potential to increase student placement capacity while providing quality learning experiences with comparable or superior outcomes to traditional methods. This project investigated the current use of SLEs in Australian speech-language pathology curricula, and the potential future applications of SLEs to the clinical education curricula through an extensive consultative process with stakeholders (all 10 Australian universities offering speech-language pathology programs in 2010, Speech Pathology Australia, members of the speech-language pathology profession, and current student body). Current use of SLEs in speech-language pathology education was found to be limited, with additional resources required to further develop SLEs and maintain their use within the curriculum. Perceived benefits included: students' increased clinical skills prior to workforce placement, additional exposure to specialized areas of speech-language pathology practice, inter-professional learning, and richer observational experiences for novice students. Stakeholders perceived SLEs to have considerable potential for clinical learning. A nationally endorsed recommendation for SLE development and curricula integration was prepared.
Acoustical conditions for speech communication in active elementary school classrooms
NASA Astrophysics Data System (ADS)
Sato, Hiroshi; Bradley, John
2005-04-01
Detailed acoustical measurements were made in 34 active elementary school classrooms with typical rectangular room shape in schools near Ottawa, Canada. There was an average of 21 students in classrooms. The measurements were made to obtain accurate indications of the acoustical quality of conditions for speech communication during actual teaching activities. Mean speech and noise levels were determined from the distribution of recorded sound levels and the average speech-to-noise ratio was 11 dBA. Measured mid-frequency reverberation times (RT) during the same occupied conditions varied from 0.3 to 0.6 s, and were a little less than for the unoccupied rooms. RT values were not related to noise levels. Octave band speech and noise levels, useful-to-detrimental ratios, and Speech Transmission Index values were also determined. Key results included: (1) The average vocal effort of teachers corresponded to louder than Pearsons Raised voice level; (2) teachers increase their voice level to overcome ambient noise; (3) effective speech levels can be enhanced by up to 5 dB by early reflection energy; and (4) student activity is seen to be the dominant noise source, increasing average noise levels by up to 10 dBA during teaching activities. [Work supported by CLLRnet.
Afrashtehfar, Kelvin I
2016-06-01
Data sourcesMedline, Cochrane Database of Systematic Reviews, Database of Abstracts of Reviews of Effects, Cochrane Central Register of Controlled Trials, Virtual Health Library and Web of Science were systematically searched up to July 2015 without limitations. Scopus, Google Scholar, ClinicalTrials.gov, the ISRCTN registry as well as reference lists of the trials included and relevant reviews were manually searched.Study selectionRandomised (RCTs) and prospective non-randomised clinical trials (non-RCTs) on human patients that compared therapeutic and adverse effects of lingual and labial appliances were considered. One reviewer initially screened titles and subsequently two reviewers independently screened the selected abstracts and full texts.Data extraction and synthesisThe data were extracted independently by the reviewers. Missing or unclear information, ongoing trials and raw data from split-mouth trials were requested from the authors of the trials. The quality of the included trials and potential bias across studies were assessed using Cochrane's risk of bias tool and the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. For parallel trials, mean difference (MD) and the relative risk (RR) were used for continuous (objective speech performance, subjective speech performance, intercanine width, intermolar width and sagittal anchorage loss) and binary outcomes (eating difficulty), respectively. The standardised mean difference (SMD) was chosen to pool, after conversion, the outcome (oral discomfort) that assessed both binary and continuous. Random-effects meta-analyses were conducted, followed by subgroup and sensitivity analyses.ResultsThirteen papers pertaining to 11 clinical trials (three parallel RCTs, one split-mouth RCT and seven parallel prospective non-RCTs) were included with a total of 407 (34% male/66% female) patients. All trials had at least one bias domain at high risk of bias. Compared with labial appliances, lingual appliances were associated with increased overall oral discomfort, increased speech impediment (measured using auditory analysis), worse speech performance assessed by laypersons, increased eating difficulty and decreased intermolar width. On the other hand, lingual appliances were associated with increased intercanine width and significantly decreased anchorage loss of the maxillary first molar during space closure. However, the quality of all analyses included was judged as very low because of the high risk of bias of the included trials, inconsistency and imprecision.ConclusionsBased on existing trials there is insufficient evidence to make robust recommendations for lingual fixed orthodontic appliances regarding their therapeutic or adverse effects, as the quality of evidence was low.
Speech Intelligibility of Aircrew Mask Communication Configurations in High-Noise Environments
2017-09-28
ARL-TR-8168 ● Sep 2017 US Army Research Laboratory Speech Intelligibility of Aircrew Mask Communication Configurations in High ...Laboratory Speech Intelligibility of Aircrew Mask Communication Configurations in High -Noise Environments by Kimberly A Pollard and Lamar Garrett...in High - Noise Environments 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Kimberly A Pollard and Lamar
Adult-like processing of time-compressed speech by newborns: A NIRS study.
Issard, Cécile; Gervain, Judit
2017-06-01
Humans can adapt to a wide range of variations in the speech signal, maintaining an invariant representation of the linguistic information it contains. Among them, adaptation to rapid or time-compressed speech has been well studied in adults, but the developmental origin of this capacity remains unknown. Does this ability depend on experience with speech (if yes, as heard in utero or as heard postnatally), with sounds in general or is it experience-independent? Using near-infrared spectroscopy, we show that the newborn brain can discriminate between three different compression rates: normal, i.e. 100% of the original duration, moderately compressed, i.e. 60% of original duration and highly compressed, i.e. 30% of original duration. Even more interestingly, responses to normal and moderately compressed speech are similar, showing a canonical hemodynamic response in the left temporoparietal, right frontal and right temporal cortex, while responses to highly compressed speech are inverted, showing a decrease in oxyhemoglobin concentration. These results mirror those found in adults, who readily adapt to moderately compressed, but not to highly compressed speech, showing that adaptation to time-compressed speech requires little or no experience with speech, and happens at an auditory, and not at a more abstract linguistic level. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Audio visual speech source separation via improved context dependent association model
NASA Astrophysics Data System (ADS)
Kazemi, Alireza; Boostani, Reza; Sobhanmanesh, Fariborz
2014-12-01
In this paper, we exploit the non-linear relation between a speech source and its associated lip video as a source of extra information to propose an improved audio-visual speech source separation (AVSS) algorithm. The audio-visual association is modeled using a neural associator which estimates the visual lip parameters from a temporal context of acoustic observation frames. We define an objective function based on mean square error (MSE) measure between estimated and target visual parameters. This function is minimized for estimation of the de-mixing vector/filters to separate the relevant source from linear instantaneous or time-domain convolutive mixtures. We have also proposed a hybrid criterion which uses AV coherency together with kurtosis as a non-Gaussianity measure. Experimental results are presented and compared in terms of visually relevant speech detection accuracy and output signal-to-interference ratio (SIR) of source separation. The suggested audio-visual model significantly improves relevant speech classification accuracy compared to existing GMM-based model and the proposed AVSS algorithm improves the speech separation quality compared to reference ICA- and AVSS-based methods.
Alderete, John; Davies, Monica
2018-04-01
This work describes a methodology of collecting speech errors from audio recordings and investigates how some of its assumptions affect data quality and composition. Speech errors of all types (sound, lexical, syntactic, etc.) were collected by eight data collectors from audio recordings of unscripted English speech. Analysis of these errors showed that: (i) different listeners find different errors in the same audio recordings, but (ii) the frequencies of error patterns are similar across listeners; (iii) errors collected "online" using on the spot observational techniques are more likely to be affected by perceptual biases than "offline" errors collected from audio recordings; and (iv) datasets built from audio recordings can be explored and extended in a number of ways that traditional corpus studies cannot be.
Zeng, Yin-Ting; Hwu, Wuh-Liang; Torng, Pao-Chuan; Lee, Ni-Chung; Shieh, Jeng-Yi; Lu, Lu; Chien, Yin-Hsiu
2017-05-01
Patients with infantile-onset Pompe disease (IOPD) can be treated by recombinant human acid alpha glucosidase (rhGAA) replacement beginning at birth with excellent survival rates, but they still commonly present with speech disorders. This study investigated the progress of speech disorders in these early-treated patients and ascertained the relationship with treatments. Speech disorders, including hypernasal resonance, articulation disorders, and speech intelligibility, were scored by speech-language pathologists using auditory perception in seven early-treated patients over a period of 6 years. Statistical analysis of the first and last evaluations of the patients was performed with the Wilcoxon signed-rank test. A total of 29 speech samples were analyzed. All the patients suffered from hypernasality, articulation disorder, and impairment in speech intelligibility at the age of 3 years. The conditions were stable, and 2 patients developed normal or near normal speech during follow-up. Speech therapy and a high dose of rhGAA appeared to improve articulation in 6 of the 7 patients (86%, p = 0.028) by decreasing the omission of consonants, which consequently increased speech intelligibility (p = 0.041). Severity of hypernasality greatly reduced only in 2 patients (29%, p = 0.131). Speech disorders were common even in early and successfully treated patients with IOPD; however, aggressive speech therapy and high-dose rhGAA could improve their speech disorders. Copyright © 2016 European Paediatric Neurology Society. Published by Elsevier Ltd. All rights reserved.
Effects of Instantaneous Multiband Dynamic Compression on Speech Intelligibility
NASA Astrophysics Data System (ADS)
Herzke, Tobias; Hohmann, Volker
2005-12-01
The recruitment phenomenon, that is, the reduced dynamic range between threshold and uncomfortable level, is attributed to the loss of instantaneous dynamic compression on the basilar membrane. Despite this, hearing aids commonly use slow-acting dynamic compression for its compensation, because this was found to be the most successful strategy in terms of speech quality and intelligibility rehabilitation. Former attempts to use fast-acting compression gave ambiguous results, raising the question as to whether auditory-based recruitment compensation by instantaneous compression is in principle applicable in hearing aids. This study thus investigates instantaneous multiband dynamic compression based on an auditory filterbank. Instantaneous envelope compression is performed in each frequency band of a gammatone filterbank, which provides a combination of time and frequency resolution comparable to the normal healthy cochlea. The gain characteristics used for dynamic compression are deduced from categorical loudness scaling. In speech intelligibility tests, the instantaneous dynamic compression scheme was compared against a linear amplification scheme, which used the same filterbank for frequency analysis, but employed constant gain factors that restored the sound level for medium perceived loudness in each frequency band. In subjective comparisons, five of nine subjects preferred the linear amplification scheme and would not accept the instantaneous dynamic compression in hearing aids. Four of nine subjects did not perceive any quality differences. A sentence intelligibility test in noise (Oldenburg sentence test) showed little to no negative effects of the instantaneous dynamic compression, compared to linear amplification. A word intelligibility test in quiet (one-syllable rhyme test) showed that the subjects benefit from the larger amplification at low levels provided by instantaneous dynamic compression. Further analysis showed that the increase in intelligibility resulting from a gain provided by instantaneous compression is as high as from a gain provided by linear amplification. No negative effects of the distortions introduced by the instantaneous compression scheme in terms of speech recognition are observed.
Educational Information Quantization for Improving Content Quality in Learning Management Systems
ERIC Educational Resources Information Center
Rybanov, Alexander Aleksandrovich
2014-01-01
The article offers the educational information quantization method for improving content quality in Learning Management Systems. The paper considers questions concerning analysis of quality of quantized presentation of educational information, based on quantitative text parameters: average frequencies of parts of speech, used in the text; formal…
ERIC Educational Resources Information Center
Arcon, Nina; Klein, Perry D.; Dombroski, Jill D.
2017-01-01
Previous research has shown that both dictation and speech-to-text (STT) software can increase the quality of writing for native English speakers. The purpose of this study was to investigate the effect of these modalities on the written composition and cognitive load of elementary school English language learners (ELLs). In a within-subjects…
ERIC Educational Resources Information Center
Bryden, James D.
The purpose of this study was to specify variables which function significantly in the racial identification and speech quality rating of Negro and white speakers by Negro and white listeners. Ninety-one adults served as subjects for the speech task; 86 of these subjects, 43 Negro and 43 white, provided the listener responses. Subjects were chosen…
Sensory Information Processing
1975-12-31
system noise . To see how this is avoided, note that zeroes in the blur spectrum become sharp, spike-like negative «*»• Page impulses when the...Synthetic Speech Quality Using Binaural Reverberation-- Boll 12 13 Section 4. Noise Suppression with Linear Prediction Filtering—Peterson 24 Section...5. Speech Processing to Reduce Noise and Improve Intelligibility— Callahan 28 Section 6. Linear Predictive Coding with a Glottal 36 Section 7
Effects and modeling of phonetic and acoustic confusions in accented speech.
Fung, Pascale; Liu, Yi
2005-11-01
Accented speech recognition is more challenging than standard speech recognition due to the effects of phonetic and acoustic confusions. Phonetic confusion in accented speech occurs when an expected phone is pronounced as a different one, which leads to erroneous recognition. Acoustic confusion occurs when the pronounced phone is found to lie acoustically between two baseform models and can be equally recognized as either one. We propose that it is necessary to analyze and model these confusions separately in order to improve accented speech recognition without degrading standard speech recognition. Since low phonetic confusion units in accented speech do not give rise to automatic speech recognition errors, we focus on analyzing and reducing phonetic and acoustic confusability under high phonetic confusion conditions. We propose using likelihood ratio test to measure phonetic confusion, and asymmetric acoustic distance to measure acoustic confusion. Only accent-specific phonetic units with low acoustic confusion are used in an augmented pronunciation dictionary, while phonetic units with high acoustic confusion are reconstructed using decision tree merging. Experimental results show that our approach is effective and superior to methods modeling phonetic confusion or acoustic confusion alone in accented speech, with a significant 5.7% absolute WER reduction, without degrading standard speech recognition.
[(Re)habilitation after cochlear implantation].
Diller, G
2009-07-01
Over the last years, indications for cochlear implants (CIs) have changed dramatically. The benefits depend on the preconditions of the individual patient as well as on the subsequent (re)habilitation. Therefore, many variables influencing the hearing and speech perception of a CI user must be kept in mind. As an example, the special situation of children having Turkish as their mother tongue is described. The most convincing argument concerning (re)habilitation is its benefit. Indeed, this benefit represents the final standard of quality and serves as the yardstick for standard assessments of (re)habilitation quality. CI (re)habilitation includes medical, pedagogical, audiological, hearing and speech, and psychological therapeutic aspects.
Feeney, Rachel; Desha, Laura; Khan, Asaduzzaman; Ziviani, Jenny
2017-04-01
The trajectory of health-related quality-of-life (HRQoL) for children aged 4-9 years and its relationship with speech and language difficulties (SaLD) was examined using data from the Longitudinal Study of Australian Children (LSAC). Generalized linear latent and mixed modelling was used to analyse data from three waves of the LSAC across four HRQoL domains (physical, emotional, social and school functioning). Four domains of HRQoL, measured using the Paediatric Quality-of-Life Inventory (PedsQL™), were examined to find the contribution of SaLD while accounting for child-specific factors (e.g. gender, ethnicity, temperament) and family characteristics (social ecological considerations and psychosocial stressors). In multivariable analyses, one measure of SaLD, namely parent concern about receptive language, was negatively associated with all HRQoL domains. Covariates positively associated with all HRQoL domains included child's general health, maternal mental health, parental warmth and primary caregiver's engagement in the labour force. Findings suggest that SaLD are associated with reduced HRQoL. For most LSAC study children, having typical speech/language skills was a protective factor positively associated with HRQoL.
Communication in a noisy environment: Perception of one's own voice and speech enhancement
NASA Astrophysics Data System (ADS)
Le Cocq, Cecile
Workers in noisy industrial environments are often confronted to communication problems. Lost of workers complain about not being able to communicate easily with their coworkers when they wear hearing protectors. In consequence, they tend to remove their protectors, which expose them to the risk of hearing loss. In fact this communication problem is a double one: first the hearing protectors modify one's own voice perception; second they interfere with understanding speech from others. This double problem is examined in this thesis. When wearing hearing protectors, the modification of one's own voice perception is partly due to the occlusion effect which is produced when an earplug is inserted in the car canal. This occlusion effect has two main consequences: first the physiological noises in low frequencies are better perceived, second the perception of one's own voice is modified. In order to have a better understanding of this phenomenon, the literature results are analyzed systematically, and a new method to quantify the occlusion effect is developed. Instead of stimulating the skull with a bone vibrator or asking the subject to speak as is usually done in the literature, it has been decided to excite the buccal cavity with an acoustic wave. The experiment has been designed in such a way that the acoustic wave which excites the buccal cavity does not excite the external car or the rest of the body directly. The measurement of the hearing threshold in open and occluded car has been used to quantify the subjective occlusion effect for an acoustic wave in the buccal cavity. These experimental results as well as those reported in the literature have lead to a better understanding of the occlusion effect and an evaluation of the role of each internal path from the acoustic source to the internal car. The speech intelligibility from others is altered by both the high sound levels of noisy industrial environments and the speech signal attenuation due to hearing protectors. A possible solution to this problem is to denoise the speech signal and transmit it under the hearing protector. Lots of denoising techniques are available and are often used for denoising speech in telecommunication. In the framework of this thesis, denoising by wavelet thresholding is considered. A first study on "classical" wavelet denoising technics is conducted in order to evaluate their performance in noisy industrial environments. The tested speech signals are altered by industrial noises according to a wide range of signal to noise ratios. The speech denoised signals are evaluated with four criteria. A large database is obtained and analyzed with a selection algorithm which has been designed for this purpose. This first study has lead to the identification of the influence from the different parameters of the wavelet denoising method on its quality and has identified the "classical" method which has given the best performances in terms of denoising quality. This first study has also generated ideas for designing a new thresholding rule suitable for speech wavelet denoising in an industrial noisy environment. In a second study, this new thresholding rule is presented and evaluated. Its performances are better than the "classical" method found in the first study when the signal to noise ratio from the speech signal is between --10 dB and 15 dB.
Bone, Daniel; Lee, Chi-Chun; Black, Matthew P.; Williams, Marian E.; Lee, Sungbok; Levitt, Pat; Narayanan, Shrikanth
2015-01-01
Purpose The purpose of this study was to examine relationships between prosodic speech cues and autism spectrum disorder (ASD) severity, hypothesizing a mutually interactive relationship between the speech characteristics of the psychologist and the child. The authors objectively quantified acoustic-prosodic cues of the psychologist and of the child with ASD during spontaneous interaction, establishing a methodology for future large-sample analysis. Method Speech acoustic-prosodic features were semiautomatically derived from segments of semistructured interviews (Autism Diagnostic Observation Schedule, ADOS; Lord, Rutter, DiLavore, & Risi, 1999; Lord et al., 2012) with 28 children who had previously been diagnosed with ASD. Prosody was quantified in terms of intonation, volume, rate, and voice quality. Research hypotheses were tested via correlation as well as hierarchical and predictive regression between ADOS severity and prosodic cues. Results Automatically extracted speech features demonstrated prosodic characteristics of dyadic interactions. As rated ASD severity increased, both the psychologist and the child demonstrated effects for turn-end pitch slope, and both spoke with atypical voice quality. The psychologist’s acoustic cues predicted the child’s symptom severity better than did the child’s acoustic cues. Conclusion The psychologist, acting as evaluator and interlocutor, was shown to adjust his or her behavior in predictable ways based on the child’s social-communicative impairments. The results support future study of speech prosody of both interaction partners during spontaneous conversation, while using automatic computational methods that allow for scalable analysis on much larger corpora. PMID:24686340
Luo, Jie; Wu, Jieli; Lv, Kexing; Li, Kaichun; Wu, Jianhui; Wen, Yihui; Li, Xiaoling; Tang, Haocheng; Jiang, Aiyun; Wang, Zhangfeng; Wen, Weiping; Lei, Wenbin
2016-01-01
This study aims to analyze the postsurgical health-related quality of life (HRQOL) and quality of voice (QOV) of patients with laryngeal carcinoma with an expectation of improving the treatment and HRQOL of these patients. Based on the collection of information of patients with laryngeal carcinoma regarding clinical characteristics (age, TNM stage, with or without laryngeal preservation and/or neck dissection, with or without postoperative irradiation and/or chemotherapy, etc.), QOV using Voice Handicap Index (VIH) scale and HRQOL using EORTC QLQ-C30 and EORTCQLQ-H&N35 scales, the differences of postsurgical HRQOL related to their clinical characteristics were analyzed using univariate nonparametric tests, the main factors impacting the postsurgical HRQOL were analyzed using regression analyses (generalized linear models) and the correlation between QOV and HRQOL analyzed using spearman correlation analysis. A total of 92 patients were enrolled in this study, on whom the use of EORTC QLQ-C30, EORTC QLQ-H&N35 and VHI scales revealed that: the differences of HRQOL were significant among patients with different ages, TNM stages, and treatment modalities; the main factors impacting the postsurgical HRQOL were pain, speech disorder, and dry mouth; and QOV was significantly correlated with HRQOL. For the patients with laryngeal carcinoma included in our study, the quality of life after open surgeries were impacted by many factors predominated by pain, speech disorder, and dry mouth. It is suggested that doctors in China do more efforts on the patients' postoperative pain and xerostomia management and speech rehabilitation with the hope of improving the patients' quality of life.
NASA Astrophysics Data System (ADS)
Palaniswamy, Sumithra; Duraisamy, Prakash; Alam, Mohammad Showkat; Yuan, Xiaohui
2012-04-01
Automatic speech processing systems are widely used in everyday life such as mobile communication, speech and speaker recognition, and for assisting the hearing impaired. In speech communication systems, the quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. To obtain an intelligible speech signal and one that is more pleasant to listen, noise reduction is essential. In this paper a new Time Adaptive Discrete Bionic Wavelet Thresholding (TADBWT) scheme is proposed. The proposed technique uses Daubechies mother wavelet to achieve better enhancement of speech from additive non- stationary noises which occur in real life such as street noise and factory noise. Due to the integration of human auditory system model into the wavelet transform, bionic wavelet transform (BWT) has great potential for speech enhancement which may lead to a new path in speech processing. In the proposed technique, at first, discrete BWT is applied to noisy speech to derive TADBWT coefficients. Then the adaptive nature of the BWT is captured by introducing a time varying linear factor which updates the coefficients at each scale over time. This approach has shown better performance than the existing algorithms at lower input SNR due to modified soft level dependent thresholding on time adaptive coefficients. The objective and subjective test results confirmed the competency of the TADBWT technique. The effectiveness of the proposed technique is also evaluated for speaker recognition task under noisy environment. The recognition results show that the TADWT technique yields better performance when compared to alternate methods specifically at lower input SNR.
Dissecting choral speech: properties of the accompanist critical to stuttering reduction.
Kiefte, Michael; Armson, Joy
2008-01-01
The effects of choral speech and altered auditory feedback (AAF) on stuttering frequency were compared to identify those properties of choral speech that make it a more effective condition for stuttering reduction. Seventeen adults who stutter (AWS) participated in an experiment consisting of special choral speech conditions that were manipulated to selectively eliminate specific differences between choral speech and AAF. Consistent with previous findings, results showed that both choral speech and AAF reduced stuttering compared to solo reading. Although reductions under AAF were substantial, they were less dramatic than those for choral speech. Stuttering reduction for choral speech was highly robust even when the accompanist's voice temporally lagged that of the AWS, when there was no opportunity for dynamic interplay between the AWS and accompanist, and when the accompanist was replaced by the AWS's own voice, all of which approximate specific features of AAF. Choral speech was also highly effective in reducing stuttering across changes in speech rate and for both familiar and unfamiliar passages. We concluded that differences in properties between choral speech and AAF other than those that were manipulated in this experiment must account for differences in stuttering reduction. The reader will be able to (1) describe differences in stuttering reduction associated with altered auditory feedback compared to choral speech conditions and (2) describe differences between delivery of a second voice signal as an altered rendition of the speakers own voice (altered auditory feedback) and alterations in the voice of an accompanist (choral speech).
Advanced carcinoma of the tongue. Management by total glossectomy without laryngectomy.
Effron, M Z; Johnson, J T; Myers, E N; Curtin, H; Beery, Q; Sigler, B
1981-11-01
A major goal of any surgical program for patients with tumors is to cure their cancer. Patients requiring total glossectomy usually are seen initially with far-advanced disease, often after failure of other treatment modalities. As a result, they may be suffering from constant pain as well impairment of speech and deglutition. The prognosis is poor, and palliative surgery with good rehabilitation of the speaking and swallowing mechanisms becomes a reasonable, albeit limited, objective. Our series does show that properly selected patients can be successfully rehabilitated after total glossectomy with out laryngectomy. This successful rehabilitation begins with good patient selection and preoperative preparation. Postoperative rehabilitation requires the interplay of a highly motivated patient and a well-coordinated health care team. The physician, nurse, speech pathologist, dietitian, and social worker all have important roles in ensuring the patient's return to a good quality of life. The surgeon will direct the efforts of the team. To the nurse and the speech pathologist falls much of the bedside job of instructing and motivating the patient. Because such effective rehabilitation has been demonstrated by the success of our patients, we advocate preserving the larynx whenever possible in the patient who must undergo total glossectomy.
Age-related changes in the anticipatory coarticulation in the speech of young children
NASA Astrophysics Data System (ADS)
Parson, Mathew; Lloyd, Amanda; Stoddard, Kelly; Nissen, Shawn L.
2003-10-01
This paper investigates the possible patterns of anticipatory coarticulation in the speech of young children. Speech samples were elicited from three groups of children between 3 and 6 years of age and one comparison group of adults. The utterances were recorded online in a quiet room environment using high quality microphones and direct analog-to-digital conversion to computer disk. Formant frequency measures (F1, F2, and F3) were extracted from a centralized and unstressed vowel (schwa) spoken prior to two different sets of productions. The first set of productions consisted of the target vowel followed by a series of real words containing an initial CV(C) syllable (voiceless obstruent-monophthongal vowel) in a range of phonetic contexts, while the second set consisted of a series of nonword productions with a relatively constrained phonetic context. An analysis of variance was utilized to determine if the formant frequencies varied systematically as a function of age, gender, and phonetic context. Results will also be discussed in association with spectral moment measures extracted from the obstruent segment immediately following the target vowel. [Work supported by research funding from Brigham Young University.
Plowman, Emily K; Tabor, Lauren C; Wymer, James; Pattee, Gary
2017-08-01
Speech and swallowing impairments are highly prevalent in individuals with amyotrophic lateral sclerosis (ALS) and contribute to reduced quality of life, malnutrition, aspiration, pneumonia and death. Established practice parameters for bulbar dysfunction in ALS do not currently exist. The aim of this study was to identify current practice patterns for the evaluation of speech and swallowing function within participating Northeast ALS clinics in the United States. A 15-item survey was emailed to all registered NEALS centres. Thirty-eight sites completed the survey. The majority (92%) offered Speech-Language Pathology, augmentative and alternative communication (71%), and dietician (92%) health care services. The ALS Functional Rating Scale-Revised and body weight represented the only parameters routinely collected in greater then 90% of responding sites. Referral for modified barium swallow study was routinely utilised in only 27% of sites and the use of percutaneous gastrostomy tubes in ALS patient care was found to vary considerably. This survey reveals significant variability and inconsistency in the management of bulbar dysfunction in ALS across NEALS sites. We conclude that a great need exists for the development of bulbar practice guidelines in ALS clinical care to accurately detect and monitor bulbar dysfunction.
[Swallowing and Voice Disorders in Cancer Patients].
Tanuma, Akira
2015-07-01
Dysphagia sometimes occurs in patients with head and neck cancer, particularly in those undergoing surgery and radiotherapy for lingual, pharyngeal, and laryngeal cancer. It also occurs in patients with esophageal cancer and brain tumor. Patients who undergo glossectomy usually show impairment of the oral phase of swallowing, whereas those with pharyngeal, laryngeal, and esophageal cancer show impairment of the pharyngeal phase of swallowing. Videofluoroscopic examination of swallowing provides important information necessary for rehabilitation of swallowing in these patients. Appropriate swallowing exercises and compensatory strategies can be decided based on the findings of the evaluation. Palatal augmentation prostheses are sometimes used for rehabilitation in patients undergoing glossectomy. Patients who undergo total laryngectomy or total pharyngolaryngoesophagectomy should receive speech therapy to enable them to use alaryngeal speech methods, including electrolarynx, esophageal speech, or speech via tracheoesophageal puncture. Regaining swallowing function and speech can improve a patient's emotional health and quality of life. Therefore, it is important to manage swallowing and voice disorders appropriately.
McGhee, Hannah; Cornwell, Petrea; Addis, Paula; Jarman, Carly
2006-11-01
The aims of this preliminary study were to explore the suitability for and benefits of commencing dysarthria treatment for people with traumatic brain injury (TBI) while in post-traumatic amnesia (PTA). It was hypothesized that behaviours in PTA don't preclude participation and dysarthria characteristics would improve post-treatment. A series of comprehensive case analyses. Two participants with severe TBI received dysarthria treatment focused on motor speech deficits until emergence from PTA. A checklist of neurobehavioural sequelae of TBI was rated during therapy and perceptual and motor speech assessments were administered before and after therapy. Results revealed that certain behaviours affected the quality of therapy but didn't preclude the provision of therapy. Treatment resulted in physiological improvements in some speech sub-systems for both participants, with varying functional speech outcomes. These findings suggest that dysarthria treatment can begin and provide short-term benefits to speech production during the late stages of PTA post-TBI.
Adults with Specific Language Impairment fail to consolidate speech sounds during sleep.
Earle, F Sayako; Landi, Nicole; Myers, Emily B
2018-02-14
Specific Language Impairment (SLI) is a common learning disability that is associated with poor speech sound representations. These differences in representational quality are thought to impose a burden on spoken language processing. The underlying mechanism to account for impoverished speech sound representations remains in debate. Previous findings that implicate sleep as important for building speech representations, combined with reports of atypical sleep in SLI, motivate the current investigation into a potential consolidation mechanism as a source of impoverished representations in SLI. In the current study, we trained individuals with SLI on a new (nonnative) set of speech sounds, and tracked their perceptual accuracy and neural responses to these sounds over two days. Adults with SLI achieved comparable performance to typical controls during training, however demonstrated a distinct lack of overnight gains on the next day. We propose that those with SLI may be impaired in the consolidation of acoustic-phonetic information. Published by Elsevier B.V.
Design and performance of an analysis-by-synthesis class of predictive speech coders
NASA Technical Reports Server (NTRS)
Rose, Richard C.; Barnwell, Thomas P., III
1990-01-01
The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally. The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied. A general formulation for deriving the parametric representation used in all of the coders in the class is presented. A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general. The results of a study comparing the performances of different members of this class are presented. The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders. The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders.
Searchfield, Grant D; Linford, Tania; Kobayashi, Kei; Crowhen, David; Latzel, Matthias
2018-03-01
To compare preference for and performance of manually selected programmes to an automatic sound classifier, the Phonak AutoSense OS. A single blind repeated measures study. Participants were fit with Phonak Virto V90 ITE aids; preferences for different listening programmes were compared across four different sound scenarios (speech in: quiet, noise, loud noise and a car). Following a 4-week trial preferences were reassessed and the users preferred programme was compared to the automatic classifier for sound quality and hearing in noise (HINT test) using a 12 loudspeaker array. Twenty-five participants with symmetrical moderate-severe sensorineural hearing loss. Participant preferences of manual programme for scenarios varied considerably between and within sessions. A HINT Speech Reception Threshold (SRT) advantage was observed for the automatic classifier over participant's manual selection for speech in quiet, loud noise and car noise. Sound quality ratings were similar for both manual and automatic selections. The use of a sound classifier is a viable alternative to manual programme selection.
NASA Astrophysics Data System (ADS)
Přibil, Jiří; Přibilová, Anna; Frollo, Ivan
2017-12-01
The paper focuses on two methods of evaluation of successfulness of speech signal enhancement recorded in the open-air magnetic resonance imager during phonation for the 3D human vocal tract modeling. The first approach enables to obtain a comparison based on statistical analysis by ANOVA and hypothesis tests. The second method is based on classification by Gaussian mixture models (GMM). The performed experiments have confirmed that the proposed ANOVA and GMM classifiers for automatic evaluation of the speech quality are functional and produce fully comparable results with the standard evaluation based on the listening test method.
Effect of deep brain stimulation on different speech subsystems in patients with multiple sclerosis.
Pützer, Manfred; Barry, William John; Moringlane, Jean Richard
2007-11-01
The effect of deep brain stimulation on articulation and phonation subsystems in seven patients with multiple sclerosis (MS) was examined. Production parameters in fast syllable-repetitions were defined and measured, and the phonation quality during vowel productions was analyzed. Speech material was recorded for patients (with and without stimulation) and for a group of healthy control speakers. With stimulation, the precision of glottal and supraglottal articulatory gestures is reduced, whereas phonation has a greater tendency to be hyperfunctional in comparison with the healthy control data. Different effects on the two speech subsystems are induced by electrical stimulation of the thalamus in patients with MS.
Why talk with children matters: clinical implications of infant- and child-directed speech research.
Ratner, Nan Bernstein
2013-11-01
This article reviews basic features of infant- or child-directed speech, with particular attention to those aspects of the register that have been shown to impact profiles of child language development. It then discusses concerns that arise when describing adult input to children with language delay or disorder, or children at risk for depressed language skills. The article concludes with some recommendations for parent counseling in such cases, as well as methods that speech-language pathologists can use to improve the quality and quantity of language input to language-learning children. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Speech Communication Anxiety: An Impediment to Academic Achievement in the University Classroom.
ERIC Educational Resources Information Center
Boohar, Richard K.; Seiler, William J.
1982-01-01
The achievement levels of college students taking a bioethics course who demonstrated high and low degrees of speech anxiety were studied. Students with high speech anxiety interacted less with instructors and did not achieve as well as other students. Strategies instructors can use to help students are suggested. (Authors/PP)
Nebraska Speech, Debate, and Drama Manuals.
ERIC Educational Resources Information Center
Nebraska School Activities Association, Lincoln.
Prepared and designed to provide general information in the administration of speech activities in the Nebraska schools, this manual offers rules and regulations for speech events, high school debate, and one act plays. The section on speech events includes information about general regulations, the scope of competition, district contests, the…
Control of interior surface materials for speech privacy in high-speed train cabins.
Jang, H S; Lim, H; Jeon, J Y
2017-05-01
The effect of interior materials with various absorption coefficients on speech privacy was investigated in a 1:10 scale model of one high-speed train cabin geometry. The speech transmission index (STI) and privacy distance (r P ) were measured in the train cabin to quantify speech privacy. Measurement cases were selected for the ceiling, sidewall, and front and back walls and were classified as high-, medium- and low-absorption coefficient cases. Interior materials with high absorption coefficients yielded a low r P , and the ceiling had the largest impact on both the STI and r P among the interior elements. Combinations of the three cases were measured, and the maximum reduction in r P by the absorptive surfaces was 2.4 m, which exceeds the space between two rows of chairs in the high-speed train. Additionally, the contribution of the interior elements to speech privacy was analyzed using recorded impulse responses and a multiple regression model for r P using the equivalent absorption area. The analysis confirmed that the ceiling was the most important interior element for improving speech privacy. These results can be used to find the relative decrease in r P in the acoustic design of interior materials to improve speech privacy in train cabins. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Input and language development in bilingually developing children.
Hoff, Erika; Core, Cynthia
2013-11-01
Language skills in young bilingual children are highly varied as a result of the variability in their language experiences, making it difficult for speech-language pathologists to differentiate language disorder from language difference in bilingual children. Understanding the sources of variability in bilingual contexts and the resulting variability in children's skills will help improve language assessment practices by speech-language pathologists. In this article, we review literature on bilingual first language development for children under 5 years of age. We describe the rate of development in single and total language growth, we describe effects of quantity of input and quality of input on growth, and we describe effects of family composition on language input and language growth in bilingual children. We provide recommendations for language assessment of young bilingual children and consider implications for optimizing children's dual language development. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Effects of intelligibility on working memory demand for speech perception.
Francis, Alexander L; Nusbaum, Howard C
2009-08-01
Understanding low-intelligibility speech is effortful. In three experiments, we examined the effects of intelligibility on working memory (WM) demands imposed by perception of synthetic speech. In all three experiments, a primary speeded word recognition task was paired with a secondary WM-load task designed to vary the availability of WM capacity during speech perception. Speech intelligibility was varied either by training listeners to use available acoustic cues in a more diagnostic manner (as in Experiment 1) or by providing listeners with more informative acoustic cues (i.e., better speech quality, as in Experiments 2 and 3). In the first experiment, training significantly improved intelligibility and recognition speed; increasing WM load significantly slowed recognition. A significant interaction between training and load indicated that the benefit of training on recognition speed was observed only under low memory load. In subsequent experiments, listeners received no training; intelligibility was manipulated by changing synthesizers. Improving intelligibility without training improved recognition accuracy, and increasing memory load still decreased it, but more intelligible speech did not produce more efficient use of available WM capacity. This suggests that perceptual learning modifies the way available capacity is used, perhaps by increasing the use of more phonetically informative features and/or by decreasing use of less informative ones.
Talker identification across source mechanisms: experiments with laryngeal and electrolarynx speech.
Perrachione, Tyler K; Stepp, Cara E; Hillman, Robert E; Wong, Patrick C M
2014-10-01
The purpose of this study was to determine listeners' ability to learn talker identity from speech produced with an electrolarynx, explore source and filter differentiation in talker identification, and describe acoustic-phonetic changes associated with electrolarynx use. Healthy adult control listeners learned to identify talkers from speech recordings produced using talkers' normal laryngeal vocal source or an electrolarynx. Listeners' abilities to identify talkers from the trained vocal source (Experiment 1) and generalize this knowledge to the untrained source (Experiment 2) were assessed. Acoustic-phonetic measurements of spectral differences between source mechanisms were performed. Additional listeners attempted to match recordings from different source mechanisms to a single talker (Experiment 3). Listeners successfully learned talker identity from electrolarynx speech but less accurately than from laryngeal speech. Listeners were unable to generalize talker identity to the untrained source mechanism. Electrolarynx use resulted in vowels with higher F1 frequencies compared with laryngeal speech. Listeners matched recordings from different sources to a single talker better than chance. Electrolarynx speech, although lacking individual differences in voice quality, nevertheless conveys sufficient indexical information related to the vocal filter and articulation for listeners to identify individual talkers. Psychologically, perception of talker identity arises from a "gestalt" of the vocal source and filter.
Brammer, Anthony J; Yu, Gongqiang; Bernstein, Eric R; Cherniack, Martin G; Peterson, Donald R; Tufts, Jennifer B
2014-08-01
An adaptive, delayless, subband feed-forward control structure is employed to improve the speech signal-to-noise ratio (SNR) in the communication channel of a circumaural headset/hearing protector (HPD) from 90 Hz to 11.3 kHz, and to provide active noise control (ANC) from 50 to 800 Hz to complement the passive attenuation of the HPD. The task involves optimizing the speech SNR for each communication channel subband, subject to limiting the maximum sound level at the ear, maintaining a speech SNR preferred by users, and reducing large inter-band gain differences to improve speech quality. The performance of a proof-of-concept device has been evaluated in a pseudo-diffuse sound field when worn by human subjects under conditions of environmental noise and speech that do not pose a risk to hearing, and by simulation for other conditions. For the environmental noises employed in this study, subband speech SNR control combined with subband ANC produced greater improvement in word scores than subband ANC alone, and improved the consistency of word scores across subjects. The simulation employed a subject-specific linear model, and predicted that word scores are maintained in excess of 90% for sound levels outside the HPD of up to ∼115 dBA.
Talker identification across source mechanisms: Experiments with laryngeal and electrolarynx speech
Perrachione, Tyler K.; Stepp, Cara E.; Hillman, Robert E.; Wong, Patrick C.M.
2015-01-01
Purpose To determine listeners' ability to learn talker identity from speech produced with an electrolarynx, explore source and filter differentiation in talker identification, and describe acoustic-phonetic changes associated with electrolarynx use. Method Healthy adult control listeners learned to identify talkers from speech recordings produced using talkers' normal laryngeal vocal source or an electrolarynx. Listeners' abilities to identify talkers from the trained vocal source (Experiment 1) and generalize this knowledge to the untrained source (Experiment 2) were assessed. Acoustic-phonetic measurements of spectral differences between source mechanisms were performed. Additional listeners attempted to match recordings from different source mechanisms to a single talker (Experiment 3). Results Listeners successfully learned talker identity from electrolarynx speech, but less accurately than from laryngeal speech. Listeners were unable to generalize talker identity to the untrained source mechanism. Electrolarynx use resulted in vowels with higher F1 frequencies compared to laryngeal speech. Listeners matched recordings from different sources to a single talker better than chance. Conclusions Electrolarynx speech, though lacking individual differences in voice quality, nevertheless conveys sufficient indexical information related to the vocal filter and articulation for listeners to identify individual talkers. Psychologically, perception of talker identity arises from a “gestalt” of the vocal source and filter. PMID:24801962
NASA Astrophysics Data System (ADS)
Pishravian, Arash; Aghabozorgi Sahaf, Masoud Reza
2012-12-01
In this paper speech-music separation using Blind Source Separation is discussed. The separating algorithm is based on the mutual information minimization where the natural gradient algorithm is used for minimization. In order to do that, score function estimation from observation signals (combination of speech and music) samples is needed. The accuracy and the speed of the mentioned estimation will affect on the quality of the separated signals and the processing time of the algorithm. The score function estimation in the presented algorithm is based on Gaussian mixture based kernel density estimation method. The experimental results of the presented algorithm on the speech-music separation and comparing to the separating algorithm which is based on the Minimum Mean Square Error estimator, indicate that it can cause better performance and less processing time
[Research on Barrier-free Home Environment System Based on Speech Recognition].
Zhu, Husheng; Yu, Hongliu; Shi, Ping; Fang, Youfang; Jian, Zhuo
2015-10-01
The number of people with physical disabilities is increasing year by year, and the trend of population aging is more and more serious. In order to improve the quality of the life, a control system of accessible home environment for the patients with serious disabilities was developed to control the home electrical devices with the voice of the patients. The control system includes a central control platform, a speech recognition module, a terminal operation module, etc. The system combines the speech recognition control technology and wireless information transmission technology with the embedded mobile computing technology, and interconnects the lamp, electronic locks, alarms, TV and other electrical devices in the home environment as a whole system through a wireless network node. The experimental results showed that speech recognition success rate was more than 84% in the home environment.
Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain
Gross, Joachim; Hoogenboom, Nienke; Thut, Gregor; Schyns, Philippe; Panzeri, Stefano; Belin, Pascal; Garrod, Simon
2013-01-01
Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG) to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta) and the amplitude of high-frequency (gamma) oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex) attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations. PMID:24391472
NASA Astrophysics Data System (ADS)
Samardzic, Nikolina
The effectiveness of in-vehicle speech communication can be a good indicator of the perception of the overall vehicle quality and customer satisfaction. Currently available speech intelligibility metrics do not account in their procedures for essential parameters needed for a complete and accurate evaluation of in-vehicle speech intelligibility. These include the directivity and the distance of the talker with respect to the listener, binaural listening, hearing profile of the listener, vocal effort, and multisensory hearing. In the first part of this research the effectiveness of in-vehicle application of these metrics is investigated in a series of studies to reveal their shortcomings, including a wide range of scores resulting from each of the metrics for a given measurement configuration and vehicle operating condition. In addition, the nature of a possible correlation between the scores obtained from each metric is unknown. The metrics and the subjective perception of speech intelligibility using, for example, the same speech material have not been compared in literature. As a result, in the second part of this research, an alternative method for speech intelligibility evaluation is proposed for use in the automotive industry by utilizing a virtual reality driving environment for ultimately setting targets, including the associated statistical variability, for future in-vehicle speech intelligibility evaluation. The Speech Intelligibility Index (SII) was evaluated at the sentence Speech Receptions Threshold (sSRT) for various listening situations and hearing profiles using acoustic perception jury testing and a variety of talker and listener configurations and background noise. In addition, the effect of individual sources and transfer paths of sound in an operating vehicle to the vehicle interior sound, specifically their effect on speech intelligibility was quantified, in the framework of the newly developed speech intelligibility evaluation method. Lastly, as an example of the significance of speech intelligibility evaluation in the context of an applicable listening environment, as indicated in this research, it was found that the jury test participants required on average an approximate 3 dB increase in sound pressure level of speech material while driving and listening compared to when just listening, for an equivalent speech intelligibility performance and the same listening task.
Parveen, Sabiha; Goberman, Alexander M
2017-04-01
Quality-of-life (QoL) consists of health, psychological well-being and communication-related domains. Due to the heterogeneous nature of Parkinson disease (PD), it is important to examine effects of different domains including motor and cognitive performance or motor and speech performance among the same set of individuals. Existing studies indicate mixed findings due to use of different QoL measures and lack of general consensus regarding QoL components. The present study examined self and proxy ratings for 20 individuals with PD on Voice Handicap Index (VHI) and PDQ-39 mobility to determine effects on speech and motor-related QoL, respectively. There was good level of agreement between self and proxy ratings for PDQ-39 mobility ratings alone. In addition, no overall group differences were found for self and proxy ratings of VHI and PDQ-39 mobility ratings, thus indicating similar perceptions by individuals with PD and their communication partners for speech and motor-related changes associated with PD. Further, no significant correlations between speech and motor-related QoL were found, thereby suggesting these domains to be independent of each other. The present study indicates the need to consider both self and proxy reports to understand the impact of PD on a person's overall functioning.
Tsanas, Athanasios; Zañartu, Matías; Little, Max A.; Fox, Cynthia; Ramig, Lorraine O.; Clifford, Gari D.
2014-01-01
There has been consistent interest among speech signal processing researchers in the accurate estimation of the fundamental frequency (F0) of speech signals. This study examines ten F0 estimation algorithms (some well-established and some proposed more recently) to determine which of these algorithms is, on average, better able to estimate F0 in the sustained vowel /a/. Moreover, a robust method for adaptively weighting the estimates of individual F0 estimation algorithms based on quality and performance measures is proposed, using an adaptive Kalman filter (KF) framework. The accuracy of the algorithms is validated using (a) a database of 117 synthetic realistic phonations obtained using a sophisticated physiological model of speech production and (b) a database of 65 recordings of human phonations where the glottal cycles are calculated from electroglottograph signals. On average, the sawtooth waveform inspired pitch estimator and the nearly defect-free algorithms provided the best individual F0 estimates, and the proposed KF approach resulted in a ∼16% improvement in accuracy over the best single F0 estimation algorithm. These findings may be useful in speech signal processing applications where sustained vowels are used to assess vocal quality, when very accurate F0 estimation is required. PMID:24815269
ERIC Educational Resources Information Center
Gould, Ronald
In a wide-ranging speech on the question of educational quality in Great Britain, the author observes that the achievement of higher quality "demands the will to act in a hundred ways, even to the point of sacrifice, on the part of teachers, managers, governors, local authorities, the government, parents, newspapers, radio and T.V., and the…
Brainstem Encoding of Aided Speech in Hearing Aid Users with Cochlear Dead Region(s).
Hassaan, Mohammad Ramadan; Ibraheem, Ola Abdallah; Galhom, Dalia Helal
2016-07-01
Neural encoding of speech begins with the analysis of the signal as a whole broken down into its sinusoidal components in the cochlea, which has to be conserved up to the higher auditory centers. Some of these components target the dead regions of the cochlea causing little or no excitation. Measuring aided speech-evoked auditory brainstem response elicited by speech stimuli with different spectral maxima can give insight into the brainstem encoding of aided speech with spectral maxima at these dead regions. This research aims to study the impact of dead regions of the cochlea on speech processing at the brainstem level after a long period of hearing aid use. This study comprised 30 ears without dead regions and 46 ears with dead regions at low, mid, or high frequencies. For all ears, we measured the aided speech-evoked auditory brainstem response using speech stimuli of low, mid, and high spectral maxima. Aided speech-evoked auditory brainstem response was producible in all subjects. Responses evoked by stimuli with spectral maxima at dead regions had longer latencies and smaller amplitudes when compared with the control group or the responses of other stimuli. The presence of cochlear dead regions affects brainstem encoding of speech with spectral maxima perpendicular to these regions. Brainstem neuroplasticity and the extrinsic redundancy of speech can minimize the impact of dead regions in chronic hearing aid users.
Sound frequency affects speech emotion perception: results from congenital amusia
Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche
2015-01-01
Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718
Theodoros, Deborah; Aldridge, Danielle; Hill, Anne J; Russell, Trevor
2018-06-19
Communication and swallowing disorders are highly prevalent in people with Parkinson's disease (PD). Maintenance of functional communication and swallowing over time is challenging for the person with PD and their families and may lead to social isolation and reduced quality of life if not addressed. Speech and language therapists (SLTs) face the conundrum of providing sustainable and flexible services to meet the changing needs of people with PD. Motor, cognitive and psychological issues associated with PD, medication regimens and dependency on others often impede attendance at a centre-based service. The access difficulties experienced by people with PD require a disruptive service approach to meet their needs. Technology-enabled management using information and telecommunications technologies to provide services at a distance has the potential to improve access, and enhance the quality of SLT services to people with PD. To report the status and scope of the evidence for the use of technology in the management of the communication and swallowing disorders associated with PD. Studies were retrieved from four major databases (PubMed, CINAHL, EMBASE and Medline via Web of Science). Data relating to the types of studies, level of evidence, context, nature of the management undertaken, participant perspectives and the types of technologies involved were extracted for the review. A total of 17 studies were included in the review, 15 of which related to the management of communication and swallowing disorders in PD with two studies devoted to participant perspectives. The majority of the studies reported on the treatment of the speech disorder in PD using Lee Silverman Voice Treatment (LSVT LOUD ® ). Synchronous and asynchronous technologies were used in the studies with a predominance of the former. There was a paucity of research in the management of cognitive-communication and swallowing disorders. Research evidence supporting technology-enabled management of the communication and swallowing disorders in PD is limited and predominantly low in quality. The treatment of the speech disorder online is the most developed aspect of the technology-enabled management pathway. Future research needs to address technology-enabled management of cognitive-communication and swallowing disorders and the use of a more diverse range of technologies and management approaches to optimize SLT service delivery to people with PD. © 2018 Royal College of Speech and Language Therapists.
Neural encoding of the speech envelope by children with developmental dyslexia.
Power, Alan J; Colling, Lincoln J; Mead, Natasha; Barnes, Lisa; Goswami, Usha
2016-09-01
Developmental dyslexia is consistently associated with difficulties in processing phonology (linguistic sound structure) across languages. One view is that dyslexia is characterised by a cognitive impairment in the "phonological representation" of word forms, which arises long before the child presents with a reading problem. Here we investigate a possible neural basis for developmental phonological impairments. We assess the neural quality of speech encoding in children with dyslexia by measuring the accuracy of low-frequency speech envelope encoding using EEG. We tested children with dyslexia and chronological age-matched (CA) and reading-level matched (RL) younger children. Participants listened to semantically-unpredictable sentences in a word report task. The sentences were noise-vocoded to increase reliance on envelope cues. Envelope reconstruction for envelopes between 0 and 10Hz showed that the children with dyslexia had significantly poorer speech encoding in the 0-2Hz band compared to both CA and RL controls. These data suggest that impaired neural encoding of low frequency speech envelopes, related to speech prosody, may underpin the phonological deficit that causes dyslexia across languages. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Kim, Min-Beom; Chung, Won-Ho; Choi, Jeesun; Hong, Sung Hwa; Cho, Yang-Sun; Park, Gyuseok; Lee, Sangmin
2014-06-01
The object was to evaluate speech perception improvement through Bluetooth-implemented hearing aids in hearing-impaired adults. Thirty subjects with bilateral symmetric moderate sensorineural hearing loss participated in this study. A Bluetooth-implemented hearing aid was fitted unilaterally in all study subjects. Objective speech recognition score and subjective satisfaction were measured with a Bluetooth-implemented hearing aid to replace the acoustic connection from either a cellular phone or a loudspeaker system. In each system, participants were assigned to 4 conditions: wireless speech signal transmission into hearing aid (wireless mode) in quiet or noisy environment and conventional speech signal transmission using external microphone of hearing aid (conventional mode) in quiet or noisy environment. Also, participants completed questionnaires to investigate subjective satisfaction. Both cellular phone and loudspeaker system situation, participants showed improvements in sentence and word recognition scores with wireless mode compared to conventional mode in both quiet and noise conditions (P < .001). Participants also reported subjective improvements, including better sound quality, less noise interference, and better accuracy naturalness, when using the wireless mode (P < .001). Bluetooth-implemented hearing aids helped to improve subjective and objective speech recognition performances in quiet and noisy environments during the use of electronic audio devices.
Voice recognition through phonetic features with Punjabi utterances
NASA Astrophysics Data System (ADS)
Kaur, Jasdeep; Juglan, K. C.; Sharma, Vishal; Upadhyay, R. K.
2017-07-01
This paper deals with perception and disorders of speech in view of Punjabi language. Visualizing the importance of voice identification, various parameters of speaker identification has been studied. The speech material was recorded with a tape recorder in their normal and disguised mode of utterances. Out of the recorded speech materials, the utterances free from noise, etc were selected for their auditory and acoustic spectrographic analysis. The comparison of normal and disguised speech of seven subjects is reported. The fundamental frequency (F0) at similar places, Plosive duration at certain phoneme, Amplitude ratio (A1:A2) etc. were compared in normal and disguised speech. It was found that the formant frequency of normal and disguised speech remains almost similar only if it is compared at the position of same vowel quality and quantity. If the vowel is more closed or more open in the disguised utterance the formant frequency will be changed in comparison to normal utterance. The ratio of the amplitude (A1: A2) is found to be speaker dependent. It remains unchanged in the disguised utterance. However, this value may shift in disguised utterance if cross sectioning is not done at the same location.
Expressed parental concern regarding childhood stuttering and the Test of Childhood Stuttering.
Tumanova, Victoria; Choi, Dahye; Conture, Edward G; Walden, Tedra A
The purpose of the present study was to determine whether the Test of Childhood Stuttering observational rating scales (TOCS; Gillam et al., 2009) (1) differed between parents who did versus did not express concern (independent from the TOCS) about their child's speech fluency; (2) correlated with children's frequency of stuttering measured during a child-examiner conversation; and (3) correlated with the length and complexity of children's utterances, as indexed by mean length of utterance (MLU). Participants were 183 young children ages 3:0-5:11. Ninety-one had parents who reported concern about their child's stuttering (65 boys, 26 girls) and 92 had parents who reported no such concern (50 boys, 42 girls). Participants' conversational speech during a child-examiner conversation was analyzed for (a) frequency of occurrence of stuttered and non-stuttered disfluencies, and (b) MLU. Besides expressing concern or lack thereof about their child's speech fluency, parents completed the TOCS observational rating scales documenting how often they observe different disfluency types in speech of their children, as well as disfluency-related consequences. There were three main findings. First, parents who expressed concern (independently from the TOCS) about their child's stuttering reported significantly higher scores on the TOCS Speech Fluency and Disfluency-Related Consequences rating scales. Second, children whose parents rated them higher on the TOCS Speech Fluency rating scale produced more stuttered disfluencies during a child-examiner conversation. Third, children with higher scores on the TOCS Disfluency-Related Consequences rating scale had shorter MLU during child-examiner conversation, across age and level of language ability. Findings support the use of the TOCS observational rating scales as one documentable, objective means to determine parental perception of and concern about their child's stuttering. Findings also support the notion that parents are reasonably accurate, if not reliable, judges of the quantity and quality (i.e., stuttered vs. non-stuttered) of their child's speech disfluencies. Lastly, findings that some children may decrease their verbal output in attempts to minimize instances of stuttering - as indexed by relatively low MLU and a high TOCS Disfluency-Related Consequences scores - provides strong support for sampling young children's speech and language across various situations to obtain the most representative index possible of the child's MLU and associated instances of stuttering. Copyright © 2018 Elsevier Inc. All rights reserved.
Dziegielewski, Peter T; Teknos, Theodoros N; Durmus, Kasim; Old, Matthew; Agrawal, Amit; Kakarala, Kiran; Marcinow, Anna; Ozer, Enver
2013-11-01
Because treatment for oropharyngeal squamous cell carcinoma (OPSCC), especially in patients of older age, is associated with decreased patient quality of life (QOL) after surgery, demonstration of a less QOL-impairing treatment technique would improve patient satisfaction substantially. To determine swallowing, speech, and QOL outcomes following transoral robotic surgery (TORS) for OPSCC. This prospective cohort study of 81 patients with previously untreated OPSCC was conducted at a tertiary care academic comprehensive cancer center. Primary surgical resection via TORS and neck dissection as indicated. Patients were asked to complete the Head and Neck Cancer Inventory (HNCI) preoperatively and at 3 weeks as well as 3, 6, and 12 months postoperatively. Swallowing ability was assessed by independence from a gastrostomy tube (G-tube). Clinicopathologic and follow-up data were also collected. Mean follow-up time was 22.7 months. The HNCI response rates at 3 weeks and 3, 6, and 12 months were 79%, 60%, 63%, and 67% respectively. There were overall declines in speech, eating, aesthetic, social, and overall QOL domains in the early postoperative periods. However, at 1 year post TORS, scores for aesthetic, social, and overall QOL remained high. Radiation therapy was negatively correlated with multiple QOL domains (P < .05 for all comparisons), while age older than 55 years correlated with lower speech and aesthetic scores (P < .05 for both). Human papillomavirus status did not correlate with any QOL domain. G-tube rates at 6 and 12 months were 24% and 9%, respectively. Greater extent of TORS (>1 oropharyngeal site resected) and age older than 55 years predicted the need for a G-tube at any point after TORS (P < .05 for both). Patients with OPSCC treated with TORS maintain a high QOL at 1 year after surgery. Adjuvant treatment and older age tend to decrease QOL. Patients meeting these criteria should be counseled appropriately.
JND measurements of the speech formants parameters and its implication in the LPC pole quantization
NASA Astrophysics Data System (ADS)
Orgad, Yaakov
1988-08-01
The inherent sensitivity of auditory perception is explicitly used with the objective of designing an efficient speech encoder. Speech can be modelled by a filter representing the vocal tract shape that is driven by an excitation signal representing glottal air flow. This work concentrates on the filter encoding problem, assuming that excitation signal encoding is optimal. Linear predictive coding (LPC) techniques were used to model a short speech segment by an all-pole filter; each pole was directly related to the speech formants. Measurements were made of the auditory just noticeable difference (JND) corresponding to the natural speech formants, with the LPC filter poles as the best candidates to represent the speech spectral envelope. The JND is the maximum precision required in speech quantization; it was defined on the basis of the shift of one pole parameter of a single frame of a speech segment, necessary to induce subjective perception of the distortion, with .75 probability. The average JND in LPC filter poles in natural speech was found to increase with increasing pole bandwidth and, to a lesser extent, frequency. The JND measurements showed a large spread of the residuals around the average values, indicating that inter-formant coupling and, perhaps, other, not yet fully understood, factors were not taken into account at this stage of the research. A future treatment should consider these factors. The average JNDs obtained in this work were used to design pole quantization tables for speech coding and provided a better bit-rate than the standard quantizer of reflection coefficient; a 30-bits-per-frame pole quantizer yielded a speech quality similar to that obtained with a standard 41-bits-per-frame reflection coefficient quantizer. Owing to the complexity of the numerical root extraction system, the practical implementation of the pole quantization approach remains to be proved.
An acoustic feature-based similarity scoring system for speech rehabilitation assistance.
Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny
2016-08-01
The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the same time, also provided further deep analysis of the speech, which can be useful for the speech therapist.
Song and speech: examining the link between singing talent and speech imitation ability.
Christiner, Markus; Reiterer, Susanne M
2013-01-01
In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of "speech" on the productive level and "music" on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory.
Signal Processing Methods for Removing the Effects of Whole Body Vibration upon Speech
NASA Technical Reports Server (NTRS)
Bitner, Rachel M.; Begault, Durand R.
2014-01-01
Humans may be exposed to whole-body vibration in environments where clear speech communications are crucial, particularly during the launch phases of space flight and in high-performance aircraft. Prior research has shown that high levels of vibration cause a decrease in speech intelligibility. However, the effects of whole-body vibration upon speech are not well understood, and no attempt has been made to restore speech distorted by whole-body vibration. In this paper, a model for speech under whole-body vibration is proposed and a method to remove its effect is described. The method described reduces the perceptual effects of vibration, yields higher ASR accuracy scores, and may significantly improve intelligibility. Possible applications include incorporation within communication systems to improve radio-communication systems in environments such a spaceflight, aviation, or off-road vehicle operations.
Suppression of competing speech through entrainment of cortical oscillations
D'Zmura, Michael; Srinivasan, Ramesh
2013-01-01
People are highly skilled at attending to one speaker in the presence of competitors, but the neural mechanisms supporting this remain unclear. Recent studies have argued that the auditory system enhances the gain of a speech stream relative to competitors by entraining (or “phase-locking”) to the rhythmic structure in its acoustic envelope, thus ensuring that syllables arrive during periods of high neuronal excitability. We hypothesized that such a mechanism could also suppress a competing speech stream by ensuring that syllables arrive during periods of low neuronal excitability. To test this, we analyzed high-density EEG recorded from human adults while they attended to one of two competing, naturalistic speech streams. By calculating the cross-correlation between the EEG channels and the speech envelopes, we found evidence of entrainment to the attended speech's acoustic envelope as well as weaker yet significant entrainment to the unattended speech's envelope. An independent component analysis (ICA) decomposition of the data revealed sources in the posterior temporal cortices that displayed robust correlations to both the attended and unattended envelopes. Critically, in these components the signs of the correlations when attended were opposite those when unattended, consistent with the hypothesized entrainment-based suppressive mechanism. PMID:23515789
TARSITANO, A.; VIETTI, M.V.; CIPRIANI, R.; MARCHETTI, C.
2013-01-01
SUMMARY The aim of the present study is to assess functional outcomes after hemiglossectomy and microvascular reconstruction. Twenty-six patients underwent primary tongue microvascular reconstruction after hemiglossectomy. Twelve patients were reconstructed using a free radial forearm flap and 14 with an anterolateral thigh flap. Speech intelligibility, swallowing capacity and quality of life scores were assessed. Factors such as tumour extension, surgical resection and adjuvant radiotherapy appeared to be fundamental to predict post-treatment functional outcomes. The data obtained in the present study indicate that swallowing capacity after hemiglossectomy is better when an anterolateral thigh flap is used. No significant differences were seen for speech intelligibility or quality of life between free radial forearm flap and anterolateral thigh flap. PMID:24376292
Kingyon, J; Behroozmand, R; Kelley, R; Oya, H; Kawasaki, H; Narayanan, N S; Greenlee, J D W
2015-10-01
The neural basis of human speech is unclear. Intracranial electrophysiological recordings have revealed that high-gamma band oscillations (70-150Hz) are observed in the frontal lobe during speech production and in the temporal lobe during speech perception. Here, we tested the hypothesis that the frontal and temporal brain regions had high-gamma coherence during speech. We recorded electrocorticography (ECoG) from the frontal and temporal cortices of five humans who underwent surgery for medically intractable epilepsy, and studied coherence between the frontal and temporal cortex during vocalization and playback of vocalization. We report two novel results. First, we observed high-gamma band as well as theta (4-8Hz) coherence between frontal and temporal lobes. Second, both high-gamma and theta coherence were stronger when subjects were actively vocalizing as compared to playback of the same vocalizations. These findings provide evidence that coupling between sensory-motor networks measured by high-gamma coherence plays a key role in feedback-based monitoring and control of vocal output for human vocalization. Copyright © 2015 IBRO. Published by Elsevier Ltd. All rights reserved.
Oropharyngeal dysphagia: surveying practice patterns of the speech-language pathologist.
Martino, Rosemary; Pron, Gaylene; Diamant, Nicholas E
2004-01-01
The present study was designed to obtain a comprehensive view of the dysphagia assessment practice patterns of speech-language pathologists and their opinion on the importance of these practices using survey methods and taking into consideration clinician, patient, and practice-setting variables. A self-administered mail questionnaire was developed following established methodology to maximize response rates. Eight dysphagia experts independently rated the new survey for content validity. Test-retest reliability was assessed with a random sample of 23 participants. The survey was sent to 50 speech-language pathologists randomly selected from the Canadian professional association database of members who practice in dysphagia. Surveys were mailed according to the Dillman Total Design Method and included an incentive offer. High survey (64%) and item response (95%) rates were achieved and clinicians were reliable reporters of their practice behaviors (ICC>0.60). Of all the clinical assessment items, 36% were reported with high (>80%) utilization and 24% with low (<20%) utilization, the former pertaining to tongue motion and vocal quality after food/fluid intake and the latter to testing of oral sensation without food. One-third (33%) of instrumental assessment items were highly utilized and included assessment of bolus movement and laryngeal response to bolus misdirection. Overall, clinician experience and teaching institutions influenced greater utilization. Opinions of importance were similar to utilization behaviors (r = 0.947, p = 0.01). Of all patients referred for dysphagia assessment, full clinical assessments were administered to 71% of patients but instrumental assessments to only 36%. A hierarchical model of practice behavior is proposed to explain this pattern of progressively decreasing item utilization.
Bohm, Lauren A; Nelson, Marc E; Driver, Lynn E; Green, Glenn E
2010-12-01
To determine the importance of prelinguistic babbling by studying patterns of speech and language development after cricotracheal resection in aphonic children. Retrospective review of seven previously aphonic children who underwent cricotracheal resection by our pediatric thoracic airway team. The analyzed variables include age, sex, comorbidity, grade of stenosis, length of resected trachea, and communication methods. Data regarding the children's pre- and postsurgical communication methods, along with their utilization of speech therapy services, were obtained via speech-language pathology evaluations, clinical observations, and a standardized telephone survey supplemented by parental documentation. Postsurgical voice quality was assessed using the Pediatric Voice Outcomes Survey. All seven subjects underwent tracheostomy prior to 2 months of age when corrected for prematurity. The subjects remained aphonic for the entire duration of cannulation. Following cricotracheal resection, they experienced an initial delay in speech acquisition. Vegetative functions were the first laryngeal sounds to emerge. Initially, the children were only able to produce these sounds reflexively, but they subsequently gained voluntary control over these laryngeal functions. All subjects underwent an identifiable stage of canonical babbling that often occurred concomitantly with vocalizations. This was followed by the emergence of true speech. The initial delay in speech acquisition observed following decannulation, along with the presence of a postsurgical canonical stage in all study subjects, supports the hypothesis that babbling is necessary for speech and language development. Furthermore, the presence of babbling is universally evident regardless of the age at which speech develops. Finally, there is no demonstrable correlation between preoperative sign language and rate of speech development. Copyright © 2010 The American Laryngological, Rhinological, and Otological Society, Inc.
Vinney, Lisa A; Grade, John D; Connor, Nadine P
2012-01-01
The manner in which a communication disorder affects health-related quality of life (QOL) in children is not known. Unfortunately, collection of quality of life data via traditional paper measures is labor intensive and has several other limitations, which hinder the investigation of pediatric quality of life in children. Currently, there is not sufficient research regarding the use of electronic devices to collect pediatric patient reported outcomes in order to address such limitations. Thus, we used a cross-over design to compare responses to a pediatric health quality of life instrument (PedsQL 4.0) delivered using a handheld electronic device to those from a traditional paper form. Respondents were children with (n=9) and without (n=10) a speech or voice disorder. For paper versus the electronic format, we examined time to completion, number of incomplete or inaccurate question responses, intra-rater reliability, ease of use, and child and parent preference. There were no significant differences between children's scores, time to complete the measure, or ratings related to ease of answering questions. The percentage of children who made answering errors or omissions with paper and pencil was significantly greater than the percentage of children who made such errors using the device. This preliminary study demonstrated that use of an electronic device to collect QOL or patient-reported outcomes (PRO) data from children is more efficient than and just as feasible, reliable, and acceptable as using paper forms. The development of hardware and software applications for the collection of QOL and/or PRO data in children with speech disorders is likely warranted. The reader will be able to understand: (1) The potential benefits of using electronic data capture via handheld devices for collecting pediatric patient reported outcomes; (2) The Pediatric Quality of Life Inventory 4.0 is a measure of the perception of general health quality that has distinguished between healthy children and those with chronic health conditions; (3) Past research in communication disorders indicates that voice and speech disorders may impact quality of life in children; (4) Based on preliminary data, electronic collection of patient reported outcomes in children with and without speech/voice disorders is more efficient and equally feasible, reliable, and acceptable when compared to paper forms. Copyright © 2011 Elsevier Inc. All rights reserved.
Motif Discovery in Speech: Application to Monitoring Alzheimer's Disease.
Garrard, Peter; Nemes, Vanda; Nikolic, Dragana; Barney, Anna
2017-01-01
Perseveration - repetition of words, phrases or questions in speech - is commonly described in Alzheimer's disease (AD). Measuring perseveration is difficult, but may index cognitive performance, aiding diagnosis and disease monitoring. Continuous recording of speech would produce a large quantity of data requiring painstaking manual analysis, and risk violating patients' and others' privacy. A secure record and an automated approach to analysis are required. To record bone-conducted acoustic energy fluctuations from a subject's vocal apparatus using an accelerometer, to describe the recording and analysis stages in detail, and demonstrate that the approach is feasible in AD. Speech-related vibration was captured by an accelerometer, affixed above the temporomandibular joint. Healthy subjects read a script with embedded repetitions. Features were extracted from recorded signals and combined using Principal Component Analysis to obtain a one-dimensional representation of the feature vector. Motif discovery techniques were used to detect repeated segments. The equipment was tested in AD patients to determine device acceptability and recording quality. Comparison with the known location of embedded motifs suggests that, with appropriate parameter tuning, the motif discovery method can detect repetitions. The device was acceptable to patients and produced adequate signal quality in their home environments. We established that continuously recording bone-conducted speech and detecting perseverative patterns were both possible. In future studies we plan to associate the frequency of verbal repetitions with stage, progression and type of dementia. It is possible that the method could contribute to the assessment of disease-modifying treatments. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Cued Speech Transliteration: Effects of Speaking Rate and Lag Time on Production Accuracy
Tessler, Morgan P.
2016-01-01
Many deaf and hard-of-hearing children rely on interpreters to access classroom communication. Although the exact level of access provided by interpreters in these settings is unknown, it is likely to depend heavily on interpreter accuracy (portion of message correctly produced by the interpreter) and the factors that govern interpreter accuracy. In this study, the accuracy of 12 Cued Speech (CS) transliterators with varying degrees of experience was examined at three different speaking rates (slow, normal, fast). Accuracy was measured with a high-resolution, objective metric in order to facilitate quantitative analyses of the effect of each factor on accuracy. Results showed that speaking rate had a large negative effect on accuracy, caused primarily by an increase in omitted cues, whereas the effect of lag time on accuracy, also negative, was quite small and explained just 3% of the variance. Increased experience level was generally associated with increased accuracy; however, high levels of experience did not guarantee high levels of accuracy. Finally, the overall accuracy of the 12 transliterators, 54% on average across all three factors, was low enough to raise serious concerns about the quality of CS transliteration services that (at least some) children receive in educational settings. PMID:27221370
Xie, Zilong; Reetzke, Rachel; Chandrasekaran, Bharath
2018-05-24
Increasing visual perceptual load can reduce pre-attentive auditory cortical activity to sounds, a reflection of the limited and shared attentional resources for sensory processing across modalities. Here, we demonstrate that modulating visual perceptual load can impact the early sensory encoding of speech sounds, and that the impact of visual load is highly dependent on the predictability of the incoming speech stream. Participants (n = 20, 9 females) performed a visual search task of high (target similar to distractors) and low (target dissimilar to distractors) perceptual load, while early auditory electrophysiological responses were recorded to native speech sounds. Speech sounds were presented either in a 'repetitive context', or a less predictable 'variable context'. Independent of auditory stimulus context, pre-attentive auditory cortical activity was reduced during high visual load, relative to low visual load. We applied a data-driven machine learning approach to decode speech sounds from the early auditory electrophysiological responses. Decoding performance was found to be poorer under conditions of high (relative to low) visual load, when the incoming acoustic stream was predictable. When the auditory stimulus context was less predictable, decoding performance was substantially greater for the high (relative to low) visual load conditions. Our results provide support for shared attentional resources between visual and auditory modalities that substantially influence the early sensory encoding of speech signals in a context-dependent manner. Copyright © 2018 IBRO. Published by Elsevier Ltd. All rights reserved.
A multi-band environment-adaptive approach to noise suppression for cochlear implants.
Saki, Fatemeh; Mirzahasanloo, Taher; Kehtarnavaz, Nasser
2014-01-01
This paper presents an improved environment-adaptive noise suppression solution for the cochlear implants speech processing pipeline. This improvement is achieved by using a multi-band data-driven approach in place of a previously developed single-band data-driven approach. Seven commonly encountered noisy environments of street, car, restaurant, mall, bus, pub and train are considered to quantify the improvement. The results obtained indicate about 10% improvement in speech quality measures.
2016-07-01
music of varying complexities. We did observe improvement from the first to the last lesson and the subject expressed appreciation for the training...hearing threshold data. C. Collect pre- and post-operative speech perception data. D. Collect music appraisal and pitch data. E. Administer training...localization, and music data. We are also collecting quality of life and functional questionnaire data. In Figure 2, we show post-operative speech
ERIC Educational Resources Information Center
Stinson, Michael; Eisenberg, Sandy; Horn, Christy; Larson, Judy; Levitt, Harry; Stuckless, Ross
This report describes and discusses several applications of new computer-based technologies which enable postsecondary students with deafness or hearing impairments to read the text of the language being spoken by the instructor and fellow students virtually in real time. Two current speech-to-text options are described: (1) steno-based systems in…
ERIC Educational Resources Information Center
Braarud, Hanne Cecilie; Stormark, Kjell Morten
2008-01-01
The purpose of this study was to examine 32 mothers' sensitivity to social contingency during face-to-face interaction with their two- to four-month-old infants in a closed circuit TV set-up. Prosodic qualities and vocal sounds in mother's infant-directed (ID) speech during sequences of live interaction were compared to sequences where expressive…
Hidden Markov models in automatic speech recognition
NASA Astrophysics Data System (ADS)
Wrzoskowicz, Adam
1993-11-01
This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.
Schoeppe, Franziska; Sommer, Wieland H; Haack, Mareike; Havel, Miriam; Rheinwald, Marika; Wechtenbruch, Juliane; Fischer, Martin R; Meinel, Felix G; Sabel, Bastian O; Sommer, Nora N
2018-01-01
To compare free text (FTR) and structured reports (SR) of videofluoroscopic swallowing studies (VFSS) and evaluate satisfaction of referring otolaryngologists and speech therapists. Both standard FTR and SR of 26 patients with VFSS were acquired. A dedicated template focusing on oropharyngeal phases was created for SR using online software with clickable decision-trees and concomitant generation of semantically structured reports. All reports were evaluated regarding overall quality and content, information extraction and clinical decision support (10-point Likert scale (0 = I completely disagree, 10 = I completely agree)). Two otorhinolaryngologists and two speech therapists evaluated FTR and SR. SR received better ratings than FTR in all items. SR were perceived to contain more details on the swallowing phases (median rating: 10 vs. 5; P < 0.001), penetration and aspiration (10 vs. 5; P < 0.001) and facilitated information extraction compared to FTR (10 vs. 4; P < 0.001). Overall quality was rated significantly higher in SR than FTR (P < 0.001). SR of VFSS provide more detailed information and facilitate information extraction. SR better assist in clinical decision-making, might enhance the quality of the report and, thus, are recommended for the evaluation of VFSS. • Structured reports on videofluoroscopic exams of deglutition lead to improved report quality. • Information extraction is facilitated when using structured reports based on decision trees. • Template-based reports add more value to clinical decision-making than free text reports. • Structured reports receive better ratings by speech therapists and otolaryngologists. • Structured reports on videofluoroscopic exams may improve the comparability between exams.
Building phonetic categories: an argument for the role of sleep
Earle, F. Sayako; Myers, Emily B.
2014-01-01
The current review provides specific predictions for the role of sleep-mediated memory consolidation in the formation of new speech sound representations. Specifically, this discussion will highlight selected literature on the different ideas concerning category representation in speech, followed by a broad overview of memory consolidation and how it relates to human behavior, as relevant to speech/perceptual learning. In combining behavioral and physiological accounts from animal models with insights from the human consolidation literature on auditory skill/word learning, we are in the early stages of understanding how the transfer of experiential information between brain structures during sleep manifests in changes to online perception. Arriving at the conclusion that this process is crucial in perceptual learning and the formation of novel categories, further speculation yields the adjacent claim that the habitual disruption in this process leads to impoverished quality in the representation of speech sounds. PMID:25477828
The character of scientists in the Nobel Prize speeches.
Condit, Celeste M
2018-05-01
This essay describes the ethos (i.e. the character projected to specific audiences) of the 25 Nobel Lectures in Physics, Chemistry, and Physiology or Medicine given in 2013-2015 and the 15 Presentation Speeches given at the Nobel Banquets between 2011 and 2015. A thematically focused qualitative analysis grounded in theories of epideictic discourse indicates the Nobel speakers demonstrated a range of strategies for and degrees of success in negotiating the tensions created by the implicit demands of ceremonial speeches, the scientific emphasis on didactic style and research content, and the different potential audiences (scientific experts and interested publics). Relatively few speeches explicitly displayed goodwill toward humanity instead of primarily toward the scientific community. Some speakers emphasized qualities of goodness in line with social values shared by broad audiences, but some reinforced stereotypes of scientists as anti-social. Speakers were variable in their ability to bridge the substantial gaps in resources for shared good sense.
A highly penetrant form of childhood apraxia of speech due to deletion of 16p11.2
Fedorenko, Evelina; Morgan, Angela; Murray, Elizabeth; Cardinaux, Annie; Mei, Cristina; Tager-Flusberg, Helen; Fisher, Simon E; Kanwisher, Nancy
2016-01-01
Individuals with heterozygous 16p11.2 deletions reportedly suffer from a variety of difficulties with speech and language. Indeed, recent copy-number variant screens of children with childhood apraxia of speech (CAS), a specific and rare motor speech disorder, have identified three unrelated individuals with 16p11.2 deletions. However, the nature and prevalence of speech and language disorders in general, and CAS in particular, is unknown for individuals with 16p11.2 deletions. Here we took a genotype-first approach, conducting detailed and systematic characterization of speech abilities in a group of 11 unrelated children ascertained on the basis of 16p11.2 deletions. To obtain the most precise and replicable phenotyping, we included tasks that are highly diagnostic for CAS, and we tested children under the age of 18 years, an age group where CAS has been best characterized. Two individuals were largely nonverbal, preventing detailed speech analysis, whereas the remaining nine met the standard accepted diagnostic criteria for CAS. These results link 16p11.2 deletions to a highly penetrant form of CAS. Our findings underline the need for further precise characterization of speech and language profiles in larger groups of affected individuals, which will also enhance our understanding of how genetic pathways contribute to human communication disorders. PMID:26173965
Use of speech-to-text technology for documentation by healthcare providers.
Ajami, Sima
2016-01-01
Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.
Analysis and enhancement of country singing
NASA Astrophysics Data System (ADS)
Lee, Matthew; Smith, Mark J. T.
2003-04-01
The study of human singing has focused extensively on the analysis of voice characteristics. At the same time, a substantial body of work has been under study aimed at modeling and synthesizing the human voice. The work on which we report brings together some key analysis and synthesis principles to create a new model for digitally improving the perceived quality of an average singing voice. The model presented employs an analysis-by-synthesis overlap-add (ABS-OLA) sinusoidal model, which in the past has been used for the analysis and synthesis of speech, in combination with a spectral model of the vocal tract. The ABS-OLA sinusoidal model for speech has been shown to be a flexible, accurate, and computationally efficient representation capable of producing a natural-sounding singing voice [E. B. George and M. J. T. Smith, Trans. Speech Audio Processing 5, 389-406 (1997)]. A spectral model infused in the ABS-OLA uses Generalized Gaussian functions to provide a simple framework which enables the precise modification of spectral characteristics while maintaining the quality and naturalness of the original voice. Furthermore, it is shown that the parameters of the new ABS-OLA can accommodate pitch corrections and vocal quality enhancements while preserving naturalness and singer identity. Examples of enhanced country singing will be presented.
Brainstem Encoding of Aided Speech in Hearing Aid Users with Cochlear Dead Region(s)
Hassaan, Mohammad Ramadan; Ibraheem, Ola Abdallah; Galhom, Dalia Helal
2016-01-01
Introduction Neural encoding of speech begins with the analysis of the signal as a whole broken down into its sinusoidal components in the cochlea, which has to be conserved up to the higher auditory centers. Some of these components target the dead regions of the cochlea causing little or no excitation. Measuring aided speech-evoked auditory brainstem response elicited by speech stimuli with different spectral maxima can give insight into the brainstem encoding of aided speech with spectral maxima at these dead regions. Objective This research aims to study the impact of dead regions of the cochlea on speech processing at the brainstem level after a long period of hearing aid use. Methods This study comprised 30 ears without dead regions and 46 ears with dead regions at low, mid, or high frequencies. For all ears, we measured the aided speech-evoked auditory brainstem response using speech stimuli of low, mid, and high spectral maxima. Results Aided speech-evoked auditory brainstem response was producible in all subjects. Responses evoked by stimuli with spectral maxima at dead regions had longer latencies and smaller amplitudes when compared with the control group or the responses of other stimuli. Conclusion The presence of cochlear dead regions affects brainstem encoding of speech with spectral maxima perpendicular to these regions. Brainstem neuroplasticity and the extrinsic redundancy of speech can minimize the impact of dead regions in chronic hearing aid users. PMID:27413404
Rosemann, Stephanie; Gießing, Carsten; Özyurt, Jale; Carroll, Rebecca; Puschmann, Sebastian; Thiel, Christiane M.
2017-01-01
Noise-vocoded speech is commonly used to simulate the sensation after cochlear implantation as it consists of spectrally degraded speech. High individual variability exists in learning to understand both noise-vocoded speech and speech perceived through a cochlear implant (CI). This variability is partly ascribed to differing cognitive abilities like working memory, verbal skills or attention. Although clinically highly relevant, up to now, no consensus has been achieved about which cognitive factors exactly predict the intelligibility of speech in noise-vocoded situations in healthy subjects or in patients after cochlear implantation. We aimed to establish a test battery that can be used to predict speech understanding in patients prior to receiving a CI. Young and old healthy listeners completed a noise-vocoded speech test in addition to cognitive tests tapping on verbal memory, working memory, lexicon and retrieval skills as well as cognitive flexibility and attention. Partial-least-squares analysis revealed that six variables were important to significantly predict vocoded-speech performance. These were the ability to perceive visually degraded speech tested by the Text Reception Threshold, vocabulary size assessed with the Multiple Choice Word Test, working memory gauged with the Operation Span Test, verbal learning and recall of the Verbal Learning and Retention Test and task switching abilities tested by the Comprehensive Trail-Making Test. Thus, these cognitive abilities explain individual differences in noise-vocoded speech understanding and should be considered when aiming to predict hearing-aid outcome. PMID:28638329
Nixon, C W; Morris, L J; McCavitt, A R; McKinley, R L; Anderson, T R; McDaniel, M P; Yeager, D G
1998-07-01
Female produced speech, although more intelligible than male speech in some noise spectra, may be more vulnerable to degradation by high levels of some military aircraft cockpit noises. The acoustic features of female speech are higher in frequency, lower in power, and appear more susceptible than male speech to masking by some of these military noises. Current military aircraft voice communication systems were optimized for the male voice and may not adequately accommodate the female voice in these high level noises. This applied study investigated the intelligibility of female and male speech produced in the noise spectra of four military aircraft cockpits at levels ranging from 95 dB to 115 dB. The experimental subjects used standard flight helmets and headsets, noise-canceling microphones, and military aircraft voice communications systems during the measurements. The intelligibility of female speech was lower than that of male speech for all experimental conditions; however, differences were small and insignificant except at the highest levels of the cockpit noises. Intelligibility for both genders varied with aircraft noise spectrum and level. Speech intelligibility of both genders was acceptable during normal cruise noises of all four aircraft, but improvements are required in the higher levels of noise created during aircraft maximum operating conditions. The intelligibility of female speech was unacceptable at the highest measured noise level of 115 dB and may constitute a problem for other military aviators. The intelligibility degradation due to the noise can be neutralized by use of an available, improved noise-canceling microphone, by the application of current active noise reduction technology to the personal communication equipment, and by the development of a voice communications system to accommodate the speech produced by both female and male aviators.
Luo, Xin; Ashmore, Krista B
2014-06-01
Context-dependent pitch perception helps listeners recognize tones produced by speakers with different fundamental frequencies (f0s). The role of language experience in tone normalization remains unclear. In this cross-language study of tone normalization, native Mandarin and English listeners were asked to recognize Mandarin Tone 1 (high-flat) and Tone 2 (mid-rising) with a preceding Mandarin sentence. To further test whether context-dependent pitch perception is speech-specific or domain-general, both language groups were asked to identify non-speech flat and rising pitch contours with a preceding non-speech flat pitch contour. Results showed that both Mandarin and English listeners made more rising responses with non-speech than with speech stimuli, due to differences in spectral complexity and listening task between the two stimulus types. English listeners made more rising responses than Mandarin listeners with both speech and non-speech stimuli. Contrastive context effects (more rising responses in the high-f0 context than in the low-f0 context) were found with both speech and non-speech stimuli for Mandarin listeners, but not for English listeners. English listeners' lack of tone experience may have caused more rising responses and limited use of context f0 cues. These results suggest that context-dependent pitch perception in tone normalization is domain-general, but influenced by long-term language experience.
Härkönen, Kati; Kivekäs, Ilkka; Kotti, Voitto; Sivonen, Ville; Vasama, Juha-Pekka
2017-10-01
The objective of the present study is to evaluate the effect of hybrid cochlear implantation (hCI) on quality of life (QoL), quality of hearing (QoH), and working performance in adult patients, and to compare the long-term results of patients with hCI to those of patients with conventional unilateral cochlear implantation (CI), bilateral CI, and single-sided deafness (SSD) with CI. Sound localization accuracy and speech-in-noise test were also compared between these groups. Eight patients with high-frequency sensorineural hearing loss of unknown etiology were selected in the study. Patients with hCI had better long-term speech perception in noise than uni- or bilateral CI patients, but the difference was not statistically significant. The sound localization accuracy was equal in the hCI, bilateral CI, and SSD patients. QoH was statistically significantly better in bilateral CI patients than in the others. In hCI patients, residual hearing was preserved in all patients after the surgery. During the 3.6-year follow-up, the mean hearing threshold at 125-500 Hz decreased on average by 15 dB HL in the implanted ear. QoL and working performance improved significantly in all CI patients. Hearing outcomes with hCI are comparable to the results of bilateral CI or CI with SSD, but hearing in noise and sound localization are statistically significantly better than with unilateral CI. Interestingly, the impact of CI on QoL, QoH, and working performance was similar in all groups.
Siegel, Ellin B; Maddox, Laura L; Ogletree, Billy T; Westling, David L
2010-01-01
Speech-language pathologists in school settings were surveyed with an instrument created from the National Joint Committee for the Communication Needs of Persons with Severe Disabilities' quality indicators self-assessment tool. Participants valued practice indicators of quality communication assessment and intervention to a higher degree than their actual practice. These findings appear to suggest that SLPs may not provide best practice services to individuals with severe disabilities. Suggestions for enhancing inservice training and intervention practices of SLPs and team members who work with individuals with severe disabilities are provided. The reader will be able to; (1) understand the value of using the NJC quality indicators to guide SLP practices with individuals with severe disabilities in schools; (2) recognize that research indicates that SLPs working with individuals with severe disabilities in schools may not provide best practice services to the extent that they value these practices; (3) discuss possible strategies to increase the quality of services provided to individuals with severe disabilities in schools.
Auditory brainstem response to complex sounds predicts self-reported speech-in-noise performance.
Anderson, Samira; Parbery-Clark, Alexandra; White-Schwoch, Travis; Kraus, Nina
2013-02-01
To compare the ability of the auditory brainstem response to complex sounds (cABR) to predict subjective ratings of speech understanding in noise on the Speech, Spatial, and Qualities of Hearing Scale (SSQ; Gatehouse & Noble, 2004) relative to the predictive ability of the Quick Speech-in-Noise test (QuickSIN; Killion, Niquette, Gudmundsen, Revit, & Banerjee, 2004) and pure-tone hearing thresholds. Participants included 111 middle- to older-age adults (range = 45-78) with audiometric configurations ranging from normal hearing levels to moderate sensorineural hearing loss. In addition to using audiometric testing, the authors also used such evaluation measures as the QuickSIN, the SSQ, and the cABR. Multiple linear regression analysis indicated that the inclusion of brainstem variables in a model with QuickSIN, hearing thresholds, and age accounted for 30% of the variance in the Speech subtest of the SSQ, compared with significantly less variance (19%) when brainstem variables were not included. The authors' results demonstrate the cABR's efficacy for predicting self-reported speech-in-noise perception difficulties. The fact that the cABR predicts more variance in self-reported speech-in-noise (SIN) perception than either the QuickSIN or hearing thresholds indicates that the cABR provides additional insight into an individual's ability to hear in background noise. In addition, the findings underscore the link between the cABR and hearing in noise.
Lower Vocal Tract Morphologic Adjustments Are Relevant for Voice Timbre in Singing.
Mainka, Alexander; Poznyakovskiy, Anton; Platzek, Ivan; Fleischer, Mario; Sundberg, Johan; Mürbe, Dirk
2015-01-01
The vocal tract shape is crucial to voice production. Its lower part seems particularly relevant for voice timbre. This study analyzes the detailed morphology of parts of the epilaryngeal tube and the hypopharynx for the sustained German vowels /a/, /e/, /i/, /o/, and /u/ by thirteen male singer subjects who were at the beginning of their academic singing studies. Analysis was based on two different phonatory conditions: a natural, speech-like phonation and a singing phonation, like in classical singing. 3D models of the vocal tract were derived from magnetic resonance imaging and compared with long-term average spectrum analysis of audio recordings from the same subjects. Comparison of singing to the speech-like phonation, which served as reference, showed significant adjustments of the lower vocal tract: an average lowering of the larynx by 8 mm and an increase of the hypopharyngeal cross-sectional area (+ 21:9%) and volume (+ 16:8%). Changes in the analyzed epilaryngeal portion of the vocal tract were not significant. Consequently, lower larynx-to-hypopharynx area and volume ratios were found in singing compared to the speech-like phonation. All evaluated measures of the lower vocal tract varied significantly with vowel quality. Acoustically, an increase of high frequency energy in singing correlated with a wider hypopharyngeal area. The findings offer an explanation how classical male singers might succeed in producing a voice timbre with increased high frequency energy, creating a singer`s formant cluster.
Lower Vocal Tract Morphologic Adjustments Are Relevant for Voice Timbre in Singing
Mainka, Alexander; Poznyakovskiy, Anton; Platzek, Ivan; Fleischer, Mario; Sundberg, Johan; Mürbe, Dirk
2015-01-01
The vocal tract shape is crucial to voice production. Its lower part seems particularly relevant for voice timbre. This study analyzes the detailed morphology of parts of the epilaryngeal tube and the hypopharynx for the sustained German vowels /a/, /e/, /i/, /o/, and /u/ by thirteen male singer subjects who were at the beginning of their academic singing studies. Analysis was based on two different phonatory conditions: a natural, speech-like phonation and a singing phonation, like in classical singing. 3D models of the vocal tract were derived from magnetic resonance imaging and compared with long-term average spectrum analysis of audio recordings from the same subjects. Comparison of singing to the speech-like phonation, which served as reference, showed significant adjustments of the lower vocal tract: an average lowering of the larynx by 8 mm and an increase of the hypopharyngeal cross-sectional area (+ 21.9%) and volume (+ 16.8%). Changes in the analyzed epilaryngeal portion of the vocal tract were not significant. Consequently, lower larynx-to-hypopharynx area and volume ratios were found in singing compared to the speech-like phonation. All evaluated measures of the lower vocal tract varied significantly with vowel quality. Acoustically, an increase of high frequency energy in singing correlated with a wider hypopharyngeal area. The findings offer an explanation how classical male singers might succeed in producing a voice timbre with increased high frequency energy, creating a singer‘s formant cluster. PMID:26186691
Henshaw, Helen; Ferguson, Melanie A.
2013-01-01
Background Auditory training involves active listening to auditory stimuli and aims to improve performance in auditory tasks. As such, auditory training is a potential intervention for the management of people with hearing loss. Objective This systematic review (PROSPERO 2011: CRD42011001406) evaluated the published evidence-base for the efficacy of individual computer-based auditory training to improve speech intelligibility, cognition and communication abilities in adults with hearing loss, with or without hearing aids or cochlear implants. Methods A systematic search of eight databases and key journals identified 229 articles published since 1996, 13 of which met the inclusion criteria. Data were independently extracted and reviewed by the two authors. Study quality was assessed using ten pre-defined scientific and intervention-specific measures. Results Auditory training resulted in improved performance for trained tasks in 9/10 articles that reported on-task outcomes. Although significant generalisation of learning was shown to untrained measures of speech intelligibility (11/13 articles), cognition (1/1 articles) and self-reported hearing abilities (1/2 articles), improvements were small and not robust. Where reported, compliance with computer-based auditory training was high, and retention of learning was shown at post-training follow-ups. Published evidence was of very-low to moderate study quality. Conclusions Our findings demonstrate that published evidence for the efficacy of individual computer-based auditory training for adults with hearing loss is not robust and therefore cannot be reliably used to guide intervention at this time. We identify a need for high-quality evidence to further examine the efficacy of computer-based auditory training for people with hearing loss. PMID:23675431
Cream, Angela; O'Brian, Sue; Jones, Mark; Block, Susan; Harrison, Elisabeth; Lincoln, Michelle; Hewat, Sally; Packman, Ann; Menzies, Ross; Onslow, Mark
2010-08-01
In this study, the authors investigated the efficacy of video self-modeling (VSM) following speech restructuring treatment to improve the maintenance of treatment effects. The design was an open-plan, parallel-group, randomized controlled trial. Participants were 89 adults and adolescents who undertook intensive speech restructuring treatment. Post treatment, participants were randomly assigned to 2 trial arms: standard maintenance and standard maintenance plus VSM. Participants in the latter arm viewed stutter-free videos of themselves each day for 1 month. The addition of VSM did not improve speech outcomes, as measured by percent syllables stuttered, at either 1 or 6 months postrandomization. However, at the latter assessment, self-rating of worst stuttering severity by the VSM group was 10% better than that of the control group, and satisfaction with speech fluency was 20% better. Quality of life was also better for the VSM group, which was mildly to moderately impaired compared with moderate impairment in the control group. VSM intervention after treatment was associated with improvements in self-reported outcomes. The clinical implications of this finding are discussed.
Speech waveform perturbation analysis: a perceptual-acoustical comparison of seven measures.
Askenfelt, A G; Hammarberg, B
1986-03-01
The performance of seven acoustic measures of cycle-to-cycle variations (perturbations) in the speech waveform was compared. All measures were calculated automatically and applied on running speech. Three of the measures refer to the frequency of occurrence and severity of waveform perturbations in special selected parts of the speech, identified by means of the rate of change in the fundamental frequency. Three other measures refer to statistical properties of the distribution of the relative frequency differences between adjacent pitch periods. One perturbation measure refers to the percentage of consecutive pitch period differences with alternating signs. The acoustic measures were tested on tape recorded speech samples from 41 voice patients, before and after successful therapy. Scattergrams of acoustic waveform perturbation data versus an average of perceived deviant voice qualities, as rated by voice clinicians, are presented. The perturbation measures were compared with regard to the acoustic-perceptual correlation and their ability to discriminate between normal and pathological voice status. The standard deviation of the distribution of the relative frequency differences was suggested as the most useful acoustic measure of waveform perturbations for clinical applications.
A Dynamically Focusing Cochlear Implant Strategy Can Improve Vowel Identification in Noise.
Arenberg, Julie G; Parkinson, Wendy S; Litvak, Leonid; Chen, Chen; Kreft, Heather A; Oxenham, Andrew J
2018-03-09
The standard, monopolar (MP) electrode configuration used in commercially available cochlear implants (CI) creates a broad electrical field, which can lead to unwanted channel interactions. Use of more focused configurations, such as tripolar and phased array, has led to mixed results for improving speech understanding. The purpose of the present study was to assess the efficacy of a physiologically inspired configuration called dynamic focusing, using focused tripolar stimulation at low levels and less focused stimulation at high levels. Dynamic focusing may better mimic cochlear excitation patterns in normal acoustic hearing, while reducing the current levels necessary to achieve sufficient loudness at high levels. Twenty postlingually deafened adult CI users participated in the study. Speech perception was assessed in quiet and in a four-talker babble background noise. Speech stimuli were closed-set spondees in noise, and medial vowels at 50 and 60 dB SPL in quiet and in noise. The signal to noise ratio was adjusted individually such that performance was between 40 and 60% correct with the MP strategy. Subjects were fitted with three experimental strategies matched for pulse duration, pulse rate, filter settings, and loudness on a channel-by-channel basis. The strategies included 14 channels programmed in MP, fixed partial tripolar (σ = 0.8), and dynamic partial tripolar (σ at 0.8 at threshold and 0.5 at the most comfortable level). Fifteen minutes of listening experience was provided with each strategy before testing. Sound quality ratings were also obtained. Speech perception performance for vowel identification in quiet at 50 and 60 dB SPL and for spondees in noise was similar for the three tested strategies. However, performance on vowel identification in noise was significantly better for listeners using the dynamic focusing strategy. Sound quality ratings were similar for the three strategies. Some subjects obtained more benefit than others, with some individual differences explained by the relation between loudness growth and the rate of change from focused to broader stimulation. These initial results suggest that further exploration of dynamic focusing is warranted. Specifically, optimizing such strategies on an individual basis may lead to improvements in speech perception for more adult listeners and improve how CIs are tailored. Some listeners may also need a longer period of time to acclimate to a new program.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
Segmenting words from natural speech: subsegmental variation in segmental cues.
Rytting, C Anton; Brew, Chris; Fosler-Lussier, Eric
2010-06-01
Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We use this new representation to re-evaluate a key computational model of word segmentation. One finding is that high levels of phonetic variability degrade the model's performance. While robustness to phonetic variability may be intrinsically valuable, this finding needs to be complemented by parallel studies of the actual abilities of children to segment phonetically variable speech.
Peng, Jianxin; Yan, Nanjie; Wang, Dan
2015-01-01
The present study investigated Chinese speech intelligibility in 28 classrooms from nine different elementary schools in Guangzhou, China. The subjective Chinese speech intelligibility in the classrooms was evaluated with children in grades 2, 4, and 6 (7 to 12 years old). Acoustical measurements were also performed in these classrooms. Subjective Chinese speech intelligibility scores and objective speech intelligibility parameters, such as speech transmission index (STI), were obtained at each listening position for all tests. The relationship between subjective Chinese speech intelligibility scores and STI was revealed and analyzed. The effects of age on Chinese speech intelligibility scores were compared. Results indicate high correlations between subjective Chinese speech intelligibility scores and STI for grades 2, 4, and 6 children. Chinese speech intelligibility scores increase with increase of age under the same STI condition. The differences in scores among different age groups decrease as STI increases. To achieve 95% Chinese speech intelligibility scores, the STIs required for grades 2, 4, and 6 children are 0.75, 0.69, and 0.63, respectively.
ERIC Educational Resources Information Center
Howe, Heather; Barnett, David
2013-01-01
This consultation description reports parent and teacher problem solving for a preschool child with no typical speech directed to teachers or peers, and, by parent report, normal speech at home. This child's initial pattern of speech was similar to selective mutism, a low-incidence disorder often first detected during the preschool years, but…
Labby, Alex; Mace, Jess C; Buncke, Michelle; MacArthur, Carol J
2016-09-01
To evaluate quality-of-life changes after bilateral pressure equalization tube placement with or without adenoidectomy for the treatment of chronic otitis media with effusion or recurrent acute otitis media in a pediatric Down syndrome population compared to controls. Prospective case-control observational study. The OM Outcome Survey (OMO-22) was administered to both patients with Down syndrome and controls before bilateral tube placement with or without adenoidectomy and at an average of 6-7 months postoperatively. Thirty-one patients with Down syndrome and 34 controls were recruited. Both pre-operative and post-operative between-group and within-group score comparisons were conducted for the Physical, Hearing/Balance, Speech, Emotional, and Social domains of the OMO-22. Both groups experienced improvement of mean symptom scores post-operatively. Patients with Down syndrome reported significant post-operative improvement in mean Physical and Hearing domain item scores while control patients reported significant improvement in Physical, Hearing, and Emotional domain item scores. All four symptom scores in the Speech domain, both pre-operatively and post-operatively, were significantly worse for Down syndrome patients compared to controls (p ≤ 0.008). Surgical placement of pressure equalizing tubes results in significant quality of life improvements in patients with Down syndrome and controls. Problems related to speech and balance are reported at a higher rate and persist despite intervention in the Down syndrome population. It is possible that longer follow up periods and/or more sensitive tools are required to measure speech improvements in the Down syndrome population after pressure equalizing tube placement ± adenoidectomy. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Sinai, A; Crone, N E; Wied, H M; Franaszczuk, P J; Miglioretti, D; Boatman-Reich, D
2009-01-01
We compared intracranial recordings of auditory event-related responses with electrocortical stimulation mapping (ESM) to determine their functional relationship. Intracranial recordings and ESM were performed, using speech and tones, in adult epilepsy patients with subdural electrodes implanted over lateral left cortex. Evoked N1 responses and induced spectral power changes were obtained by trial averaging and time-frequency analysis. ESM impaired perception and comprehension of speech, not tones, at electrode sites in the posterior temporal lobe. There was high spatial concordance between ESM sites critical for speech perception and the largest spectral power (100% concordance) and N1 (83%) responses to speech. N1 responses showed good sensitivity (0.75) and specificity (0.82), but poor positive predictive value (0.32). Conversely, increased high-frequency power (>60Hz) showed high specificity (0.98), but poorer sensitivity (0.67) and positive predictive value (0.67). Stimulus-related differences were observed in the spatial-temporal patterns of event-related responses. Intracranial auditory event-related responses to speech were associated with cortical sites critical for auditory perception and comprehension of speech. These results suggest that the distribution and magnitude of intracranial auditory event-related responses to speech reflect the functional significance of the underlying cortical regions and may be useful for pre-surgical functional mapping.
Intracranial mapping of auditory perception: Event-related responses and electrocortical stimulation
Sinai, A.; Crone, N.E.; Wied, H.M.; Franaszczuk, P.J.; Miglioretti, D.; Boatman-Reich, D.
2010-01-01
Objective We compared intracranial recordings of auditory event-related responses with electrocortical stimulation mapping (ESM) to determine their functional relationship. Methods Intracranial recordings and ESM were performed, using speech and tones, in adult epilepsy patients with subdural electrodes implanted over lateral left cortex. Evoked N1 responses and induced spectral power changes were obtained by trial averaging and time-frequency analysis. Results ESM impaired perception and comprehension of speech, not tones, at electrode sites in the posterior temporal lobe. There was high spatial concordance between ESM sites critical for speech perception and the largest spectral power (100% concordance) and N1 (83%) responses to speech. N1 responses showed good sensitivity (0.75) and specificity (0.82), but poor positive predictive value (0.32). Conversely, increased high-frequency power (>60 Hz) showed high specificity (0.98), but poorer sensitivity (0.67) and positive predictive value (0.67). Stimulus-related differences were observed in the spatial-temporal patterns of event-related responses. Conclusions Intracranial auditory event-related responses to speech were associated with cortical sites critical for auditory perception and comprehension of speech. Significance These results suggest that the distribution and magnitude of intracranial auditory event-related responses to speech reflect the functional significance of the underlying cortical regions and may be useful for pre-surgical functional mapping. PMID:19070540
... High blood pressure Slurred speech Are these drugs legal in the United States? Some of these drugs ... High blood pressure Slurred speech Are these drugs legal in the United States? Some of these drugs ...
ERIC Educational Resources Information Center
Lam, Christa; Kitamura, Christine
2010-01-01
Purpose: This study examined a mother's speech style and interactive behaviors with her twin sons: 1 with bilateral hearing impairment (HI) and the other with normal hearing (NH). Method: The mother was video-recorded interacting with her twin sons when the boys were 12.5 and 22 months of age. Mean F0, F0 range, duration, and F1/F2 vowel space of…
ERIC Educational Resources Information Center
Wray, Denise; Flexer, Carol
2010-01-01
A collaborative team of faculty from The University of Akron (UA) in Akron, Ohio, and Kent State University (KSU) in Kent, Ohio, were awarded a federal grant from the U.S. Department of Education to develop a specialty area in the graduate speech-language pathology (SLP) programs of UA and KSU that would train a total of 32 SLP students (trainees)…
Perceptual, auditory and acoustic vocal analysis of speech and singing in choir conductors.
Rehder, Maria Inês Beltrati Cornacchioni; Behlau, Mara
2008-01-01
the voice of choir conductors. to evaluate the vocal quality of choir conductors based on the production of a sustained vowel during singing and when speaking in order to observe auditory and acoustic differences. participants of this study were 100 choir conductors, with an equal distribution between genders. Participants were asked to produce the sustained vowel "é" using a singing and speaking voice. Speech samples were analyzed based on auditory-perceptive and acoustic parameters. The auditory-perceptive analysis was carried out by two speech-language pathologist, specialists in this field of knowledge. The acoustic analysis was carried out with the support of the computer software Doctor Speech (Tiger Electronics, SRD, USA, version 4.0), using the Real Analysis module. the auditory-perceptive analysis of the vocal quality indicated that most conductors have adapted voices, presenting more alterations in their speaking voice. The acoustic analysis indicated different values between genders and between the different production modalities. The fundamental frequency was higher in the singing voice, as well as the values for the first formant; the second formant presented lower values in the singing voice, with statistically significant results only for women. the voice of choir conductors is adapted, presenting fewer deviations in the singing voice when compared to the speaking voice. Productions differ based the voice modality, singing or speaking.
Haapanen, M L; Rintala, A E
1993-01-01
The quality of speech was compared in 124 young adults with isolated cleft palate. Forty-seven subjects were excluded because of the presence of factors that might have biased the evaluation of the success rate of the two operations studied, leaving 77 subjects who had undergone primary palatoplasty for analysis. One stage closure of the soft and hard palate was done for 43 patients by the mucoperiosteal palatal V to Y pushback technique (Veau-Wardill-Kilner, group V), and 34 underwent the Cronin modification (group C). Their speech was tape recorded, analysed by three qualified listeners, and hypernasality assessed by four published hypernasality indexes. More subjects in group C achieved normal resonance than in group V, who had higher hypernasality index scores than group C. The groups managed pressure consonants similarly. Only a few patients had weak plosives, audible nasal air emission, or compensatory articulation. Similar numbers of secondary operations were done for both groups. However, group V would have actually required secondary surgery more frequently than group C.
Speech perception and production in severe environments
NASA Astrophysics Data System (ADS)
Pisoni, David B.
1990-09-01
The goal was to acquire new knowledge about speech perception and production in severe environments such as high masking noise, increased cognitive load or sustained attentional demands. Changes were examined in speech production under these adverse conditions through acoustic analysis techniques. One set of studies focused on the effects of noise on speech production. The experiments in this group were designed to generate a database of speech obtained in noise and in quiet. A second set of experiments was designed to examine the effects of cognitive load on the acoustic-phonetic properties of speech. Talkers were required to carry out a demanding perceptual motor task while they read lists of test words. A final set of experiments explored the effects of vocal fatigue on the acoustic-phonetic properties of speech. Both cognitive load and vocal fatigue are present in many applications where speech recognition technology is used, yet their influence on speech production is poorly understood.
Air traffic controllers' long-term speech-in-noise training effects: A control group study.
Zaballos, Maria T P; Plasencia, Daniel P; González, María L Z; de Miguel, Angel R; Macías, Ángel R
2016-01-01
Speech perception in noise relies on the capacity of the auditory system to process complex sounds using sensory and cognitive skills. The possibility that these can be trained during adulthood is of special interest in auditory disorders, where speech in noise perception becomes compromised. Air traffic controllers (ATC) are constantly exposed to radio communication, a situation that seems to produce auditory learning. The objective of this study has been to quantify this effect. 19 ATC and 19 normal hearing individuals underwent a speech in noise test with three signal to noise ratios: 5, 0 and -5 dB. Noise and speech were presented through two different loudspeakers in azimuth position. Speech tokes were presented at 65 dB SPL, while white noise files were at 60, 65 and 70 dB respectively. Air traffic controllers outperform the control group in all conditions [P<0.05 in ANOVA and Mann-Whitney U tests]. Group differences were largest in the most difficult condition, SNR=-5 dB. However, no correlation between experience and performance were found for any of the conditions tested. The reason might be that ceiling performance is achieved much faster than the minimum experience time recorded, 5 years, although intrinsic cognitive abilities cannot be disregarded. ATC demonstrated enhanced ability to hear speech in challenging listening environments. This study provides evidence that long-term auditory training is indeed useful in achieving better speech-in-noise understanding even in adverse conditions, although good cognitive qualities are likely to be a basic requirement for this training to be effective. Our results show that ATC outperform the control group in all conditions. Thus, this study provides evidence that long-term auditory training is indeed useful in achieving better speech-in-noise understanding even in adverse conditions.
Song and speech: examining the link between singing talent and speech imitation ability
Christiner, Markus; Reiterer, Susanne M.
2013-01-01
In previous research on speech imitation, musicality, and an ability to sing were isolated as the strongest indicators of good pronunciation skills in foreign languages. We, therefore, wanted to take a closer look at the nature of the ability to sing, which shares a common ground with the ability to imitate speech. This study focuses on whether good singing performance predicts good speech imitation. Forty-one singers of different levels of proficiency were selected for the study and their ability to sing, to imitate speech, their musical talent and working memory were tested. Results indicated that singing performance is a better indicator of the ability to imitate speech than the playing of a musical instrument. A multiple regression revealed that 64% of the speech imitation score variance could be explained by working memory together with educational background and singing performance. A second multiple regression showed that 66% of the speech imitation variance of completely unintelligible and unfamiliar language stimuli (Hindi) could be explained by working memory together with a singer's sense of rhythm and quality of voice. This supports the idea that both vocal behaviors have a common grounding in terms of vocal and motor flexibility, ontogenetic and phylogenetic development, neural orchestration and auditory memory with singing fitting better into the category of “speech” on the productive level and “music” on the acoustic level. As a result, good singers benefit from vocal and motor flexibility, productively and cognitively, in three ways. (1) Motor flexibility and the ability to sing improve language and musical function. (2) Good singers retain a certain plasticity and are open to new and unusual sound combinations during adulthood both perceptually and productively. (3) The ability to sing improves the memory span of the auditory working memory. PMID:24319438
Davidow, Jason H.; Bothe, Anne K.; Ye, Jun
2011-01-01
The most common way to induce fluency using rhythm requires persons who stutter to speak one syllable or one word to each beat of a metronome, but stuttering can also be eliminated when the stimulus is of a particular duration (e.g., 1 s). The present study examined stuttering frequency, speech production changes, and speech naturalness during rhythmic speech that alternated 1 s of reading with 1 s of silence. A repeated-measures design was used to compare data obtained during a control reading condition and during rhythmic reading in 10 persons who stutter (PWS) and 10 normally fluent controls. Ratings for speech naturalness were also gathered from naïve listeners. Results showed that mean vowel duration increased significantly, and the percentage of short phonated intervals decreased significantly, for both groups from the control to the experimental condition. Mean phonated interval length increased significantly for the fluent controls. Mean speech naturalness ratings during the experimental condition were approximately 7 on a 1–9 scale (1 = highly natural; 9 = highly unnatural), and these ratings were significantly correlated with vowel duration and phonated intervals for PWS. The findings indicate that PWS may be altering vocal fold vibration duration to obtain fluency during this rhythmic speech style, and that vocal fold vibration duration may have an impact on speech naturalness during rhythmic speech. Future investigations should examine speech production changes and speech naturalness during variations of this rhythmic condition. Educational Objectives The reader will be able to: (1) describe changes (from a control reading condition) in speech production variables when alternating between 1 s of reading and 1 s of silence, (2) describe which rhythmic conditions have been found to sound and feel the most natural, (3) describe methodological issues for studies about alterations in speech production variables during fluency-inducing conditions, and (4) describe which fluency-inducing conditions have been shown to involve a reduction in short phonated intervals. PMID:21664528
Pérez Zaballos, María Teresa; Ramos de Miguel, Ángel; Pérez Plasencia, Daniel; Zaballos González, María Luisa; Ramos Macías, Ángel
2015-12-01
To evaluate 1) if air traffic controllers (ATC) perform better than non-air traffic controllers in an open-set speech-in-noise test because of their experience with radio communications, and 2) if high-frequency information (>8000 Hz) substantially improves speech-in-noise perception across populations. The control group comprised 28 normal-hearing subjects, and the target group comprised 48 ATCs aged between 19 and 55 years who were native Spanish speakers. The hearing -in-noise abilities of the two groups were characterized under two signal conditions: 1) speech tokens and white noise sampled at 44.1 kHz (unfiltered condition) and 2) speech tokens plus white noise, each passed through a 4th order Butterworth filter with 70 and 8000 Hz low and high cutoffs (filtered condition). These tests were performed at signal-to-noise ratios of +5, 0, and -5-dB SNR. The ATCs outperformed the control group in all conditions. The differences were statistically significant in all cases, and the largest difference was observed under the most difficult conditions (-5 dB SNR). Overall, scores were higher when high-frequency components were not suppressed for both groups, although statistically significant differences were not observed for the control group at 0 dB SNR. The results indicate that ATCs are more capable of identifying speech in noise. This may be due to the effect of their training. On the other hand, performance seems to decrease when the high frequency components of speech are removed, regardless of training.
Choi, Ji Eun; Moon, Il Joon; Kim, Eun Yeon; Park, Hee-Sung; Kim, Byung Kil; Chung, Won-Ho; Cho, Yang-Sun; Brown, Carolyn J; Hong, Sung Hwa
The aim of this study was to compare binaural performance of auditory localization task and speech perception in babble measure between children who use a cochlear implant (CI) in one ear and a hearing aid (HA) in the other (bimodal fitting) and those who use bilateral CIs. Thirteen children (mean age ± SD = 10 ± 2.9 years) with bilateral CIs and 19 children with bimodal fitting were recruited to participate. Sound localization was assessed using a 13-loudspeaker array in a quiet sound-treated booth. Speakers were placed in an arc from -90° azimuth to +90° azimuth (15° interval) in horizontal plane. To assess the accuracy of sound location identification, we calculated the absolute error in degrees between the target speaker and the response speaker during each trial. The mean absolute error was computed by dividing the sum of absolute errors by the total number of trials. We also calculated the hemifield identification score to reflect the accuracy of right/left discrimination. Speech-in-babble perception was also measured in the sound field using target speech presented from the front speaker. Eight-talker babble was presented in the following four different listening conditions: from the front speaker (0°), from one of the two side speakers (+90° or -90°), from both side speakers (±90°). Speech, spatial, and quality questionnaire was administered. When the two groups of children were directly compared with each other, there was no significant difference in localization accuracy ability or hemifield identification score under binaural condition. Performance in speech perception test was also similar to each other under most babble conditions. However, when the babble was from the first device side (CI side for children with bimodal stimulation or first CI side for children with bilateral CIs), speech understanding in babble by bilateral CI users was significantly better than that by bimodal listeners. Speech, spatial, and quality scores were comparable with each other between the two groups. Overall, the binaural performance was similar to each other between children who are fit with two CIs (CI + CI) and those who use bimodal stimulation (HA + CI) in most conditions. However, the bilateral CI group showed better speech perception than the bimodal CI group when babble was from the first device side (first CI side for bilateral CI users or CI side for bimodal listeners). Therefore, if bimodal performance is significantly below the mean bilateral CI performance on speech perception in babble, these results suggest that a child should be considered to transit from bimodal stimulation to bilateral CIs.
An Australian survey of parent involvement in intervention for childhood speech sound disorders.
Sugden, Eleanor; Baker, Elise; Munro, Natalie; Williams, A Lynn; Trivette, Carol M
2017-08-17
To investigate how speech-language pathologists (SLPs) report involving parents in intervention for phonology-based speech sound disorders (SSDs), and to describe the home practice that they recommend. Further aims were to describe the training SLPs report providing to parents, to explore SLPs' beliefs and motivations for involving parents in intervention, and to determine whether SLPs' characteristics are associated with their self-reported practice. An online survey of 288 SLPs working with SSD in Australia was conducted. The majority of SLPs (96.4%) reported involving parents in intervention, most commonly in providing home practice. On average, these tasks were recommended to be completed five times per week for 10 min. SLPs reported training parents using a range of training methods, most commonly providing opportunities for parents to observe the SLP conduct the intervention. SLPs' place of work and years of experience were associated with how they involved and trained parents in intervention. Most (95.8%) SLPs agreed or strongly agreed that family involvement is essential for intervention to be effective. Parent involvement and home practice appear to be intricately linked within intervention for phonology-based SSDs in Australia. More high-quality research is needed to understand how to best involve parents within clinical practice.
... risk of complications and have better quality of life. Learn More Research Research We Fund Parkinson's Outcomes Project Grant Opportunities Science News & Progress Patient Engagement Research Our research has ...
Rowbotham, Samantha; Wardy, April J; Lloyd, Donna M; Wearden, Alison; Holler, Judith
2014-01-01
Effective pain communication is essential if adequate treatment and support are to be provided. Pain communication is often multimodal, with sufferers utilising speech, nonverbal behaviours (such as facial expressions), and co-speech gestures (bodily movements, primarily of the hands and arms that accompany speech and can convey semantic information) to communicate their experience. Research suggests that the production of nonverbal pain behaviours is positively associated with pain intensity, but it is not known whether this is also the case for speech and co-speech gestures. The present study explored whether increased pain intensity is associated with greater speech and gesture production during face-to-face communication about acute, experimental pain. Participants (N = 26) were exposed to experimentally elicited pressure pain to the fingernail bed at high and low intensities and took part in video-recorded semi-structured interviews. Despite rating more intense pain as more difficult to communicate (t(25) = 2.21, p = .037), participants produced significantly longer verbal pain descriptions and more co-speech gestures in the high intensity pain condition (Words: t(25) = 3.57, p = .001; Gestures: t(25) = 3.66, p = .001). This suggests that spoken and gestural communication about pain is enhanced when pain is more intense. Thus, in addition to conveying detailed semantic information about pain, speech and co-speech gestures may provide a cue to pain intensity, with implications for the treatment and support received by pain sufferers. Future work should consider whether these findings are applicable within the context of clinical interactions about pain.
Bartos, Anthony L; Cipr, Tomas; Nelson, Douglas J; Schwarz, Petr; Banowetz, John; Jerabek, Ladislav
2018-04-01
A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).
Accurate visible speech synthesis based on concatenating variable length motion capture data.
Ma, Jiyong; Cole, Ron; Pellom, Bryan; Ward, Wayne; Wise, Barbara
2006-01-01
We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is desrcribed. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergarten through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.
Flippin, Michelle; Reszka, Stephanie; Watson, Linda R
2010-05-01
The Picture Exchange Communication System (PECS) is a popular communication-training program for young children with autism spectrum disorders (ASD). This meta-analysis reviews the current empirical evidence for PECS in affecting communication and speech outcomes for children with ASD. A systematic review of the literature on PECS written between 1994 and June 2009 was conducted. Quality of scientific rigor was assessed and used as an inclusion criterion in computation of effect sizes. Effect sizes were aggregated separately for single-subject and group studies for communication and speech outcomes. Eight single-subject experiments (18 participants) and 3 group studies (95 PECS participants, 65 in other intervention/control) were included. Results indicated that PECS is a promising but not yet established evidence-based intervention for facilitating communication in children with ASD ages 1-11 years. Small to moderate gains in communication were demonstrated following training. Gains in speech were small to negative. This meta-analysis synthesizes gains in communication and relative lack of gains made in speech across the PECS literature for children with ASD. Concerns about maintenance and generalization are identified. Emerging evidence of potential preintervention child characteristics is discussed. Phase IV was identified as a possibly influential program characteristic for speech outcomes.
Vatakis, Argiro; Maragos, Petros; Rodomagoulakis, Isidoros; Spence, Charles
2012-01-01
We investigated how the physical differences associated with the articulation of speech affect the temporal aspects of audiovisual speech perception. Video clips of consonants and vowels uttered by three different speakers were presented. The video clips were analyzed using an auditory-visual signal saliency model in order to compare signal saliency and behavioral data. Participants made temporal order judgments (TOJs) regarding which speech-stream (auditory or visual) had been presented first. The sensitivity of participants' TOJs and the point of subjective simultaneity (PSS) were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. We expected that in the case of the place of articulation and roundedness, where the visual-speech signal is more salient, temporal perception of speech would be modulated by the visual-speech signal. No such effect was expected for the manner of articulation or height. The results demonstrate that for place and manner of articulation, participants' temporal percept was affected (although not always significantly) by highly-salient speech-signals with the visual-signals requiring smaller visual-leads at the PSS. This was not the case when height was evaluated. These findings suggest that in the case of audiovisual speech perception, a highly salient visual-speech signal may lead to higher probabilities regarding the identity of the auditory-signal that modulate the temporal window of multisensory integration of the speech-stimulus. PMID:23060756
Automated analysis of free speech predicts psychosis onset in high-risk youths
Bedi, Gillinder; Carrillo, Facundo; Cecchi, Guillermo A; Slezak, Diego Fernández; Sigman, Mariano; Mota, Natália B; Ribeiro, Sidarta; Javitt, Daniel C; Copelli, Mauro; Corcoran, Cheryl M
2015-01-01
Background/Objectives: Psychiatry lacks the objective clinical tests routinely used in other specializations. Novel computerized methods to characterize complex behaviors such as speech could be used to identify and predict psychiatric illness in individuals. AIMS: In this proof-of-principle study, our aim was to test automated speech analyses combined with Machine Learning to predict later psychosis onset in youths at clinical high-risk (CHR) for psychosis. Methods: Thirty-four CHR youths (11 females) had baseline interviews and were assessed quarterly for up to 2.5 years; five transitioned to psychosis. Using automated analysis, transcripts of interviews were evaluated for semantic and syntactic features predicting later psychosis onset. Speech features were fed into a convex hull classification algorithm with leave-one-subject-out cross-validation to assess their predictive value for psychosis outcome. The canonical correlation between the speech features and prodromal symptom ratings was computed. Results: Derived speech features included a Latent Semantic Analysis measure of semantic coherence and two syntactic markers of speech complexity: maximum phrase length and use of determiners (e.g., which). These speech features predicted later psychosis development with 100% accuracy, outperforming classification from clinical interviews. Speech features were significantly correlated with prodromal symptoms. Conclusions: Findings support the utility of automated speech analysis to measure subtle, clinically relevant mental state changes in emergent psychosis. Recent developments in computer science, including natural language processing, could provide the foundation for future development of objective clinical tests for psychiatry. PMID:27336038
[Improving speech comprehension using a new cochlear implant speech processor].
Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A
2009-06-01
The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg sentences in the clinical setting S(0)N(CI), with speech signal at 0 degrees and noise lateral to the CI at 90 degrees . With the convincing findings from our evaluations of this multicenter study cohort, a trial with the Freedom 24 sound processor for all suitable CI users is recommended. For evaluating the benefits of a new processor, the comparative assessment paradigm used in our study design would be considered ideal for use with individual patients.
Brouwer, Susanne; Van Engen, Kristin J; Calandruccio, Lauren; Bradlow, Ann R
2012-02-01
This study examined whether speech-on-speech masking is sensitive to variation in the degree of similarity between the target and the masker speech. Three experiments investigated whether speech-in-speech recognition varies across different background speech languages (English vs Dutch) for both English and Dutch targets, as well as across variation in the semantic content of the background speech (meaningful vs semantically anomalous sentences), and across variation in listener status vis-à-vis the target and masker languages (native, non-native, or unfamiliar). The results showed that the more similar the target speech is to the masker speech (e.g., same vs different language, same vs different levels of semantic content), the greater the interference on speech recognition accuracy. Moreover, the listener's knowledge of the target and the background language modulate the size of the release from masking. These factors had an especially strong effect on masking effectiveness in highly unfavorable listening conditions. Overall this research provided evidence that that the degree of target-masker similarity plays a significant role in speech-in-speech recognition. The results also give insight into how listeners assign their resources differently depending on whether they are listening to their first or second language. © 2012 Acoustical Society of America
Brouwer, Susanne; Van Engen, Kristin J.; Calandruccio, Lauren; Bradlow, Ann R.
2012-01-01
This study examined whether speech-on-speech masking is sensitive to variation in the degree of similarity between the target and the masker speech. Three experiments investigated whether speech-in-speech recognition varies across different background speech languages (English vs Dutch) for both English and Dutch targets, as well as across variation in the semantic content of the background speech (meaningful vs semantically anomalous sentences), and across variation in listener status vis-à-vis the target and masker languages (native, non-native, or unfamiliar). The results showed that the more similar the target speech is to the masker speech (e.g., same vs different language, same vs different levels of semantic content), the greater the interference on speech recognition accuracy. Moreover, the listener’s knowledge of the target and the background language modulate the size of the release from masking. These factors had an especially strong effect on masking effectiveness in highly unfavorable listening conditions. Overall this research provided evidence that that the degree of target-masker similarity plays a significant role in speech-in-speech recognition. The results also give insight into how listeners assign their resources differently depending on whether they are listening to their first or second language. PMID:22352516
Rusz, Jan; Tykalová, Tereza; Klempíř, Jiří; Čmejla, Roman; Růžička, Evžen
2016-04-01
Although speech disorders represent an early and common manifestation of Parkinson's disease (PD), little is known about their progression and relationship to dopaminergic replacement therapy. The aim of the current study was to examine longitudinal motor speech changes after the initiation of pharmacotherapy in PD. Fifteen newly-diagnosed, untreated PD patients and ten healthy controls of comparable age were investigated. PD patients were tested before the introduction of antiparkinsonian therapy and then twice within the following 6 years. Quantitative acoustic analyses of seven key speech dimensions of hypokinetic dysarthria were performed. At baseline, PD patients showed significantly altered speech including imprecise consonants, monopitch, inappropriate silences, decreased quality of voice, slow alternating motion rates, imprecise vowels and monoloudness. At follow-up assessment, preservation or slight improvement of speech performance was objectively observed in two-thirds of PD patients within the first 3-6 years of dopaminergic treatment, primarily associated with the improvement of stop consonant articulation. The extent of speech improvement correlated with L-dopa equivalent dose (r = 0.66, p = 0.008) as well as with reduction in principal motor manifestations based on the Unified Parkinson's Disease Rating Scale (r = -0.61, p = 0.02), particularly reflecting treatment-related changes in bradykinesia but not in rigidity, tremor, or axial motor manifestations. While speech disorders are frequently present in drug-naive PD patients, they tend to improve or remain relatively stable after the initiation of dopaminergic treatment and appear to be related to the dopaminergic responsiveness of bradykinesia.
Screening for Speech and Language Delay in Children 5 Years Old and Younger: A Systematic Review.
Wallace, Ina F; Berkman, Nancy D; Watson, Linda R; Coyne-Beasley, Tamera; Wood, Charles T; Cullen, Katherine; Lohr, Kathleen N
2015-08-01
No recommendation exists for or against routine use of brief, formal screening instruments in primary care to detect speech and language delay in children through 5 years of age. This review aimed to update the evidence on screening and treating children for speech and language since the 2006 US Preventive Services Task Force systematic review. Medline, the Cochrane Library, PsycInfo, Cumulative Index to Nursing and Allied Health Literature, ClinicalTrials.gov, and reference lists. We included studies reporting diagnostic accuracy of screening tools and randomized controlled trials reporting benefits and harms of treatment of speech and language. Two independent reviewers extracted data, checked accuracy, and assigned quality ratings using predefined criteria. We found no evidence for the impact of screening on speech and language outcomes. In 23 studies evaluating the accuracy of screening tools, sensitivity ranged between 50% and 94%, and specificity ranged between 45% and 96%. Twelve treatment studies improved various outcomes in language, articulation, and stuttering; little evidence emerged for interventions improving other outcomes or for adverse effects of treatment. Risk factors associated with speech and language delay were male gender, family history, and low parental education. A limitation of this review is the lack of well-designed, well-conducted studies addressing whether screening for speech and language delay or disorders improves outcomes. Several screening tools can accurately identify children for diagnostic evaluations and interventions, but evidence is inadequate regarding applicability in primary care settings. Some treatments for young children identified with speech and language delays and disorders may be effective. Copyright © 2015 by the American Academy of Pediatrics.
ERIC Educational Resources Information Center
New Mexico State Univ., Las Cruces. New Mexico Environmental Inst.
Comments, speeches, and questions delivered at the Quality of Life Symposium are compiled in these proceedings. As an exploratory session, the conference objectives were to (1) become better informed about New Mexico--its resource base, the economy, social and cultural base, and the environment; and (2) to evaluate and discuss the role of New…
SAM: speech-aware applications in medicine to support structured data entry.
Wormek, A. K.; Ingenerf, J.; Orthner, H. F.
1997-01-01
In the last two years, improvement in speech recognition technology has directed the medical community's interest to porting and using such innovations in clinical systems. The acceptance of speech recognition systems in clinical domains increases with recognition speed, large medical vocabulary, high accuracy, continuous speech recognition, and speaker independence. Although some commercial speech engines approach these requirements, the greatest benefit can be achieved in adapting a speech recognizer to a specific medical application. The goals of our work are first, to develop a speech-aware core component which is able to establish connections to speech recognition engines of different vendors. This is realized in SAM. Second, with applications based on SAM we want to support the physician in his/her routine clinical care activities. Within the STAMP project (STAndardized Multimedia report generator in Pathology), we extend SAM by combining a structured data entry approach with speech recognition technology. Another speech-aware application in the field of Diabetes care is connected to a terminology server. The server delivers a controlled vocabulary which can be used for speech recognition. PMID:9357730
Is there an effect of dysphonic teachers' voices on children's processing of spoken language?
Rogerson, Jemma; Dodd, Barbara
2005-03-01
There is a vast body of literature on the causes, prevalence, implications, and issues of vocal dysfunction in teachers. However, the educational effect of teacher vocal impairment is largely unknown. The purpose of this study was to investigate the effect of impaired voice quality on children's processing of spoken language. One hundred and seven children (age range, 9.2 to 10.6, mean 9.8, SD 3.76 months) listened to three video passages, one read in a control voice, one in a mild dysphonic voice, and one in a severe dysphonic voice. After each video passage, children were asked to answer six questions, with multiple-choice answers. The results indicated that children's perceptions of speech across the three voice qualities differed, regardless of gender, IQ, and school attended. Performance in the control voice passages was better than performance in the mild and severe dysphonic voice passages. No difference was found between performance in the mild and severe dysphonic voice passages, highlighting that any form of vocal impairment is detrimental to children's speech processing and is therefore likely to have a negative educational effect. These findings, in light of the high rate of vocal dysfunction in teachers, further support the implementation of specific voice care education for those in the teaching profession.
Microphone directionality, pre-emphasis filter, and wind noise in cochlear implants.
Chung, King; McKibben, Nicholas
2011-10-01
Wind noise can be a nuisance or a debilitating masker for cochlear implant users in outdoor environments. Previous studies indicated that wind noise at the microphone/hearing aid output had high levels of low-frequency energy and the amount of noise generated is related to the microphone directionality. Currently, cochlear implants only offer either directional microphones or omnidirectional microphones for users at-large. As all cochlear implants utilize pre-emphasis filters to reduce low-frequency energy before the signal is encoded, effective wind noise reduction algorithms for hearing aids might not be applicable for cochlear implants. The purposes of this study were to investigate the effect of microphone directionality on speech recognition and perceived sound quality of cochlear implant users in wind noise and to derive effective wind noise reduction strategies for cochlear implants. A repeated-measure design was used to examine the effects of spectral and temporal masking created by wind noise recorded through directional and omnidirectional microphones and the effects of pre-emphasis filters on cochlear implant performance. A digital hearing aid was programmed to have linear amplification and relatively flat in-situ frequency responses for the directional and omnidirectional modes. The hearing aid output was then recorded from 0 to 360° at flow velocities of 4.5 and 13.5 m/sec in a quiet wind tunnel. Sixteen postlingually deafened adult cochlear implant listeners who reported to be able to communicate on the phone with friends and family without text messages participated in the study. Cochlear implant users listened to speech in wind noise recorded at locations that the directional and omnidirectional microphones yielded the lowest noise levels. Cochlear implant listeners repeated the sentences and rated the sound quality of the testing materials. Spectral and temporal characteristics of flow noise, as well as speech and/or noise characteristics before and after the pre-emphasis filter, were analyzed. Correlation coefficients between speech recognition scores and crest factors of wind noise before and after pre-emphasis filtering were also calculated. Listeners obtained higher scores using the omnidirectional than the directional microphone mode at 13.5 m/sec, but they obtained similar speech recognition scores for the two microphone modes at 4.5 m/sec. Higher correlation coefficients were obtained between speech recognition scores and crest factors of wind noise after pre-emphasis filtering rather than before filtering. Cochlear implant users would benefit from both directional and omnidirectional microphones to reduce far-field background noise and near-field wind noise. Automatic microphone switching algorithms can be more effective if the incoming signal were analyzed after pre-emphasis filters for microphone switching decisions. American Academy of Audiology.
Speech and Swallowing in Parkinson’s Disease
Tjaden, Kris
2009-01-01
Dysarthria and dysphagia occur frequently in Parkinson’s disease (PD). Reduced speech intelligibility is a significant functional limitation of dysarthria, and in the case of PD is likely related articulatory and phonatory impairment. Prosodically-based treatments show the most promise for addressing these deficits as well as for maximizing speech intelligibility. Communication-oriented strategies also may help to enhance mutual understanding between a speaker and listener. Dysphagia in PD can result in serious health issues, including aspiration pneumonia, malnutrition, and dehydration. Early identification of swallowing abnormalities is critical so as to minimize the impact of dysphagia on health status and quality of life. Feeding modifications, compensatory strategies, and therapeutic swallowing techniques all have a role in the management of dysphagia in PD. PMID:19946386
Real-time speech encoding based on Code-Excited Linear Prediction (CELP)
NASA Technical Reports Server (NTRS)
Leblanc, Wilfrid P.; Mahmoud, S. A.
1988-01-01
This paper reports on the work proceeding with regard to the development of a real-time voice codec for the terrestrial and satellite mobile radio environments. The codec is based on a complexity reduced version of code-excited linear prediction (CELP). The codebook search complexity was reduced to only 0.5 million floating point operations per second (MFLOPS) while maintaining excellent speech quality. Novel methods to quantize the residual and the long and short term model filters are presented.
Improved Objective Measurements for Speech Quality Testing
1985-01-01
criterion with the possible exception of the channel vocoder, for which the spread in subjective responses was slightly less than des ired. 4 10 -a_- 14...n 5 19-.0 H ,e -- * ,- p--. -. a’ - - -:== .. LJ. . U .... :1±, .L±--± i- -.. ni m.,in h U ’ el l . ORII STCO 1RE PROIOP)LES p =.. •,,• mI.. 1 .S OI...parameter III specifies thce threshold between objectively interrupted and non-interrupted speech. In the foi- aula apecifying RATIO, mf is the index of the
Basic Technical Data on Transmission Systems and Equipment Using Communications Lines. Part 1.
1978-08-01
without noticeable degradation of the speech quality. - 219 - The maximum number of repeater sections: For the KNK-6s For the KNK-6t for multiquad...power circuit 1]; 15. Low frequency amplifier for direction B - Aj 16. Low frequency amplifier; 17. KNN [initial slope network]; 18. LVN-2...frequency Voice frequency ringing at 3,800 Hz with a level 0.4 - 0.8 Np lower than the speech channel level. The system for service
Assessing Speech Discrimination in Individual Infants
ERIC Educational Resources Information Center
Houston, Derek M.; Horn, David L.; Qi, Rong; Ting, Jonathan Y.; Gao, Sujuan
2007-01-01
Assessing speech discrimination skills in individual infants from clinical populations (e.g., infants with hearing impairment) has important diagnostic value. However, most infant speech discrimination paradigms have been designed to test group effects rather than individual differences. Other procedures suffer from high attrition rates. In this…
Monson, Brian B; Lotto, Andrew J; Story, Brad H
2012-09-01
The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech.
ERIC Educational Resources Information Center
Alrusayni, Norah
2017-01-01
This study was conducted to determine the effectiveness of using the high-tech speech-generating device with Proloquo2Go app to reduce echolalic utterances in a student with autism during conversational speech. After observing that the iPad device with several apps was used by the students and that it served as a communication device, language…
Sleep Disrupts High-Level Speech Parsing Despite Significant Basic Auditory Processing.
Makov, Shiri; Sharon, Omer; Ding, Nai; Ben-Shachar, Michal; Nir, Yuval; Zion Golumbic, Elana
2017-08-09
The extent to which the sleeping brain processes sensory information remains unclear. This is particularly true for continuous and complex stimuli such as speech, in which information is organized into hierarchically embedded structures. Recently, novel metrics for assessing the neural representation of continuous speech have been developed using noninvasive brain recordings that have thus far only been tested during wakefulness. Here we investigated, for the first time, the sleeping brain's capacity to process continuous speech at different hierarchical levels using a newly developed Concurrent Hierarchical Tracking (CHT) approach that allows monitoring the neural representation and processing-depth of continuous speech online. Speech sequences were compiled with syllables, words, phrases, and sentences occurring at fixed time intervals such that different linguistic levels correspond to distinct frequencies. This enabled us to distinguish their neural signatures in brain activity. We compared the neural tracking of intelligible versus unintelligible (scrambled and foreign) speech across states of wakefulness and sleep using high-density EEG in humans. We found that neural tracking of stimulus acoustics was comparable across wakefulness and sleep and similar across all conditions regardless of speech intelligibility. In contrast, neural tracking of higher-order linguistic constructs (words, phrases, and sentences) was only observed for intelligible speech during wakefulness and could not be detected at all during nonrapid eye movement or rapid eye movement sleep. These results suggest that, whereas low-level auditory processing is relatively preserved during sleep, higher-level hierarchical linguistic parsing is severely disrupted, thereby revealing the capacity and limits of language processing during sleep. SIGNIFICANCE STATEMENT Despite the persistence of some sensory processing during sleep, it is unclear whether high-level cognitive processes such as speech parsing are also preserved. We used a novel approach for studying the depth of speech processing across wakefulness and sleep while tracking neuronal activity with EEG. We found that responses to the auditory sound stream remained intact; however, the sleeping brain did not show signs of hierarchical parsing of the continuous stream of syllables into words, phrases, and sentences. The results suggest that sleep imposes a functional barrier between basic sensory processing and high-level cognitive processing. This paradigm also holds promise for studying residual cognitive abilities in a wide array of unresponsive states. Copyright © 2017 the authors 0270-6474/17/377772-10$15.00/0.
Gauvin, Hanna S; De Baene, Wouter; Brass, Marcel; Hartsuiker, Robert J
2016-02-01
To minimize the number of errors in speech, and thereby facilitate communication, speech is monitored before articulation. It is, however, unclear at which level during speech production monitoring takes place, and what mechanisms are used to detect and correct errors. The present study investigated whether internal verbal monitoring takes place through the speech perception system, as proposed by perception-based theories of speech monitoring, or whether mechanisms independent of perception are applied, as proposed by production-based theories of speech monitoring. With the use of fMRI during a tongue twister task we observed that error detection in internal speech during noise-masked overt speech production and error detection in speech perception both recruit the same neural network, which includes pre-supplementary motor area (pre-SMA), dorsal anterior cingulate cortex (dACC), anterior insula (AI), and inferior frontal gyrus (IFG). Although production and perception recruit similar areas, as proposed by perception-based accounts, we did not find activation in superior temporal areas (which are typically associated with speech perception) during internal speech monitoring in speech production as hypothesized by these accounts. On the contrary, results are highly compatible with a domain general approach to speech monitoring, by which internal speech monitoring takes place through detection of conflict between response options, which is subsequently resolved by a domain general executive center (e.g., the ACC). Copyright © 2015 Elsevier Inc. All rights reserved.
Preservation of propositional speech in a pure anomic: the importance of an abstract vocabulary.
Crutch, Sebastian J; Warrington, Elizabeth K
2003-12-01
We describe a detailed quantitative analysis of the propositional speech of a patient, FAV, who became severely anomic following a left occipito-temporal infarction. FAV showed a selective noun retrieval deficit in naming to confrontation and from verbal description. Nonetheless, his propositional speech was fluent and content-rich. To quantify this observation, three picture description-based tasks were designed to elicit spontaneous speech. These were pictures of professional occupations, real world scenes and stylised object scenes. FAV's performance was compared and contrasted with that of 5 age- and sex-matched control subjects on a number of variables including speech production rate, volume of output, pause frequency and duration, word frequency, word concreteness and diversity of vocabulary used. FAV's propositional speech fell within the range of normal control performance on the majority of measurements of quality, quantity and fluency. Only in the narrative tasks which relied more heavily upon a concrete vocabulary, did FAV become less voluble and resort to summarising the scenes in an manner. This dissociation between virtually intact propositional speech and a severe naming deficit represents the purest case of anomia currently on record. We attribute this dissociation in part to the preservation of his ability to retrieve his abstract word vocabulary. Our account demonstrates that poor performance on standard naming tasks may be indicative of only a narrowly defined word retrieval deficit. However, we also propose the existence of a feedback circuit which guides sentence construction by providing information regarding lexical availability.
Delgado Hernández, Jonathan; León Gómez, Nieves M; Jiménez, Alejandra; Izquierdo, Laura M; Barsties V Latoszek, Ben
2018-05-01
The aim of this study was to validate the Acoustic Voice Quality Index 03.01 (AVQIv3) and the Acoustic Breathiness Index (ABI) in the Spanish language. Concatenated voice samples of continuous speech (cs) and sustained vowel (sv) from 136 subjects with dysphonia and 47 vocally healthy subjects were perceptually judged for overall voice quality and breathiness severity. First, to reach a higher level of ecological validity, the proportions of cs and sv were equalized regarding the time length of 3 seconds sv part and voiced cs part, respectively. Second, concurrent validity and diagnostic accuracy were verified. A moderate reliability of overall voice quality and breathiness severity from 5 experts was used. It was found that 33 syllables as standardization of the cs part, which represents 3 seconds of voiced cs, allows the equalization of both speech tasks. A strong correlation was revealed between AVQIv3 and overall voice quality and ABI and perceived breathiness severity. Additionally, the best diagnostic outcome was identified at a threshold of 2.28 and 3.40 for AVQIv3 and ABI, respectively. The AVQIv3 and ABI showed in the Spanish language valid and robust results to quantify abnormal voice qualities regarding overall voice quality and breathiness severity.
Iuzzini-Seigel, Jenya; Hogan, Tiffany P; Green, Jordan R
2017-05-24
The current research sought to determine (a) if speech inconsistency is a core feature of childhood apraxia of speech (CAS) or if it is driven by comorbid language impairment that affects a large subset of children with CAS and (b) if speech inconsistency is a sensitive and specific diagnostic marker that can differentiate between CAS and speech delay. Participants included 48 children ranging between 4;7 to 17;8 (years;months) with CAS (n = 10), CAS + language impairment (n = 10), speech delay (n = 10), language impairment (n = 9), or typical development (n = 9). Speech inconsistency was assessed at phonemic and token-to-token levels using a variety of stimuli. Children with CAS and CAS + language impairment performed equivalently on all inconsistency assessments. Children with language impairment evidenced high levels of speech inconsistency on the phrase "buy Bobby a puppy." Token-to-token inconsistency of monosyllabic words and the phrase "buy Bobby a puppy" was sensitive and specific in differentiating children with CAS and speech delay, whereas inconsistency calculated on other stimuli (e.g., multisyllabic words) was less efficacious in differentiating between these disorders. Speech inconsistency is a core feature of CAS and is efficacious in differentiating between children with CAS and speech delay; however, sensitivity and specificity are stimuli dependent.
Kalinowski, Joseph; Saltuklaroglu, Tim; Guntupalli, Vijaya; Stuart, Andrew
2004-06-10
Instead of being the core stuttering 'problem', syllabic repetitions may be a biological mechanism, or 'solution', to the central involuntary stuttering block. Simply put, stuttering is an endogenous transitory state of 'shadowed speech', a choral speech derivative that allows for a neural release of the central block. To investigate this possibility, 14 adults who stutter read while listening to forward fluent speech, reversed fluent speech, forward stuttered speech, and reversed stuttered speech. All conditions induced significant degrees of stuttering inhibition when compared to a control condition. However, the reversed fluent condition was less powerful than the other three conditions ( approximately 42% vs. approximately 65%) for inhibiting stuttering. Stuttering inhibition appears to proceed by 'gestural recovery', made possible by the presence of an exogenous or 'second' set of speech gestures and engagement of mirror neurons. When reversed fluent speech was used, violations in normal gesture-time relationships (i.e., normal speech entropy) resulted in gestural configurations that apparently were inadequately recovered, and therefore, were not as conducive to high levels of stuttering inhibition. In contrast, high levels of encoding found in the simple syllabic structures of stuttered speech allowed its forward and reversed forms to be equally effective for gestural recovery and stuttering inhibition. The reversal of repeated syllables did not appear to significantly degrade the natural gesture-time relationships (i.e., they were perceptually recognizable). Thus, exogenous speech gestures that displayed near normal gestural relationships allowed for easy recovery and fluent productions via mirror systems, suggesting a more choral-like nature. The importance of syllabic repetitions is highlighted: both their perceived (exogenous) and produced (endogenous) forms appear to be fundamental, surface acoustic manifestations for central stuttering inhibition via the engagement of mirror neurons.
Hartel, Bas P; van Nierop, Josephine W I; Huinck, Wendy J; Rotteveel, Liselotte J C; Mylanus, Emmanuel A M; Snik, Ad F; Kunst, Henricus P M; Pennings, Ronald J E
2017-07-01
Usher syndrome type IIa (USH2a) is characterized by congenital moderate to severe hearing impairment and retinitis pigmentosa. Hearing rehabilitation starts in early childhood with the application of hearing aids. In some patients with USH2a, severe progression of hearing impairment leads to insufficient speech intelligibility with hearing aids and issues with adequate communication and safety. Cochlear implantation (CI) is the next step in rehabilitation of such patients. This study evaluates the performance and benefit of CI in patients with USH2a. Retrospective case-control study to evaluate the performance and benefit of CI in 16 postlingually deaf adults (eight patients with USH2a and eight matched controls). Performance and benefit were evaluated by a speech intelligibility test and three quality-of-life questionnaires. Patients with USH2a with a mean age of 59 years at implantation exhibited good performance after CI. The phoneme scores improved significantly from 41 to 87% in patients with USH2a (p = 0.02) and from 30 to 86% in the control group (p = 0.001). The results of the questionnaire survey demonstrated a clear benefit from CI. There were no differences in performance or benefit between patients with USH2a and control patients before and after CI. CI increases speech intelligibility and improves quality of life in patients with USH2a.
Factors affecting the perception of Korean-accented American English
NASA Astrophysics Data System (ADS)
Cho, Kwansun; Harris, John G.; Shrivastav, Rahul
2005-09-01
This experiment examines the relative contribution of two factors, intonation and articulation errors, on the perception of foreign accent in Korean-accented American English. Ten native speakers of Korean and ten native speakers of American English were asked to read ten English sentences. These sentences were then modified using high-quality speech resynthesis techniques [STRAIGHT Kawahara et al., Speech Commun. 27, 187-207 (1999)] to generate four sets of stimuli. In the first two sets of stimuli, the intonation patterns of the Korean speakers and American speakers were switched with one another. The articulatory errors for each speaker were not modified. In the final two sets, the sentences from the Korean and American speakers were resynthesized without any modifications. Fifteen listeners were asked to rate all the stimuli for the degree of foreign accent. Preliminary results show that, for native speakers of American English, articulation errors may play a greater role in the perception of foreign accent than errors in intonation patterns. [Work supported by KAIM.
Calandruccio, Lauren; Bradlow, Ann R; Dhar, Sumitrajit
2014-04-01
Masking release for an English sentence-recognition task in the presence of foreign-accented English speech compared with native-accented English speech was reported in Calandruccio et al (2010a). The masking release appeared to increase as the masker intelligibility decreased. However, it could not be ruled out that spectral differences between the speech maskers were influencing the significant differences observed. The purpose of the current experiment was to minimize spectral differences between speech maskers to determine how various amounts of linguistic information within competing speech Affiliationect masking release. A mixed-model design with within-subject (four two-talker speech maskers) and between-subject (listener group) factors was conducted. Speech maskers included native-accented English speech and high-intelligibility, moderate-intelligibility, and low-intelligibility Mandarin-accented English. Normalizing the long-term average speech spectra of the maskers to each other minimized spectral differences between the masker conditions. Three listener groups were tested, including monolingual English speakers with normal hearing, nonnative English speakers with normal hearing, and monolingual English speakers with hearing loss. The nonnative English speakers were from various native language backgrounds, not including Mandarin (or any other Chinese dialect). Listeners with hearing loss had symmetric mild sloping to moderate sensorineural hearing loss. Listeners were asked to repeat back sentences that were presented in the presence of four different two-talker speech maskers. Responses were scored based on the key words within the sentences (100 key words per masker condition). A mixed-model regression analysis was used to analyze the difference in performance scores between the masker conditions and listener groups. Monolingual English speakers with normal hearing benefited when the competing speech signal was foreign accented compared with native accented, allowing for improved speech recognition. Various levels of intelligibility across the foreign-accented speech maskers did not influence results. Neither the nonnative English-speaking listeners with normal hearing nor the monolingual English speakers with hearing loss benefited from masking release when the masker was changed from native-accented to foreign-accented English. Slight modifications between the target and the masker speech allowed monolingual English speakers with normal hearing to improve their recognition of native-accented English, even when the competing speech was highly intelligible. Further research is needed to determine which modifications within the competing speech signal caused the Mandarin-accented English to be less effective with respect to masking. Determining the influences within the competing speech that make it less effective as a masker or determining why monolingual normal-hearing listeners can take advantage of these differences could help improve speech recognition for those with hearing loss in the future. American Academy of Audiology.
Gordon-Salant, Sandra; Cole, Stacey Samuels
2016-01-01
This study aimed to determine if younger and older listeners with normal hearing who differ on working memory span perform differently on speech recognition tests in noise. Older adults typically exhibit poorer speech recognition scores in noise than younger adults, which is attributed primarily to poorer hearing sensitivity and more limited working memory capacity in older than younger adults. Previous studies typically tested older listeners with poorer hearing sensitivity and shorter working memory spans than younger listeners, making it difficult to discern the importance of working memory capacity on speech recognition. This investigation controlled for hearing sensitivity and compared speech recognition performance in noise by younger and older listeners who were subdivided into high and low working memory groups. Performance patterns were compared for different speech materials to assess whether or not the effect of working memory capacity varies with the demands of the specific speech test. The authors hypothesized that (1) normal-hearing listeners with low working memory span would exhibit poorer speech recognition performance in noise than those with high working memory span; (2) older listeners with normal hearing would show poorer speech recognition scores than younger listeners with normal hearing, when the two age groups were matched for working memory span; and (3) an interaction between age and working memory would be observed for speech materials that provide contextual cues. Twenty-eight older (61 to 75 years) and 25 younger (18 to 25 years) normal-hearing listeners were assigned to groups based on age and working memory status. Northwestern University Auditory Test No. 6 words and Institute of Electrical and Electronics Engineers sentences were presented in noise using an adaptive procedure to measure the signal-to-noise ratio corresponding to 50% correct performance. Cognitive ability was evaluated with two tests of working memory (Listening Span Test and Reading Span Test) and two tests of processing speed (Paced Auditory Serial Addition Test and The Letter Digit Substitution Test). Significant effects of age and working memory capacity were observed on the speech recognition measures in noise, but these effects were mediated somewhat by the speech signal. Specifically, main effects of age and working memory were revealed for both words and sentences, but the interaction between the two was significant for sentences only. For these materials, effects of age were observed for listeners in the low working memory groups only. Although all cognitive measures were significantly correlated with speech recognition in noise, working memory span was the most important variable accounting for speech recognition performance. The results indicate that older adults with high working memory capacity are able to capitalize on contextual cues and perform as well as young listeners with high working memory capacity for sentence recognition. The data also suggest that listeners with normal hearing and low working memory capacity are less able to adapt to distortion of speech signals caused by background noise, which requires the allocation of more processing resources to earlier processing stages. These results indicate that both younger and older adults with low working memory capacity and normal hearing are at a disadvantage for recognizing speech in noise.
Effects of low harmonics on tone identification in natural and vocoded speech.
Liu, Chang; Azimi, Behnam; Tahmina, Qudsia; Hu, Yi
2012-11-01
This study investigated the contribution of low-frequency harmonics to identifying Mandarin tones in natural and vocoded speech in quiet and noisy conditions. Results showed that low-frequency harmonics of natural speech led to highly accurate tone identification; however, for vocoded speech, low-frequency harmonics yielded lower tone identification than stimuli with full harmonics, except for tone 4. Analysis of the correlation between tone accuracy and the amplitude-F0 correlation index suggested that "more" speech contents (i.e., more harmonics) did not necessarily yield better tone recognition for vocoded speech, especially when the amplitude contour of the signals did not co-vary with the F0 contour.
Speech emotion recognition methods: A literature review
NASA Astrophysics Data System (ADS)
Basharirad, Babak; Moradhaseli, Mohammadreza
2017-10-01
Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.
Hypermedia = hypercommunication
NASA Technical Reports Server (NTRS)
Laff, Mark R.
1990-01-01
New hardware and software technology gave application designers the freedom to use new realism in human computer interaction. High-quality images, motion video, stereo sound and music, speech, touch, gesture provide richer data channels between the person and the machine. Ultimately, this will lead to richer communication between people with the computer as an intermediary. The whole point of hyper-books, hyper-newspapers, virtual worlds, is to transfer the concept and relationships, the 'data structure', from the mind of creator to that of user. Some of the characteristics of this rich information channel are discussed, and some examples are presented.
Automated Speech Rate Measurement in Dysarthria.
Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc
2015-06-01
In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. The new algorithm was trained and tested using Dutch speech samples of 36 speakers with no history of speech impairment and 40 speakers with mild to moderate dysarthria. We tested the algorithm under various conditions: according to speech task type (sentence reading, passage reading, and storytelling) and algorithm optimization method (speaker group optimization and individual speaker optimization). Correlations between automated and human SR determination were calculated for each condition. High correlations between automated and human SR determination were found in the various testing conditions. The new algorithm measures SR in a sufficiently reliable manner. It is currently being integrated in a clinical software tool for assessing and managing prosody in dysarthric speech. Further research is needed to fine-tune the algorithm to severely dysarthric speech, to make the algorithm less sensitive to background noise, and to evaluate how the algorithm deals with syllabic consonants.
Smiljanić, Rajka; Bradlow, Ann R.
2011-01-01
This study investigated how native language background interacts with speaking style adaptations in determining levels of speech intelligibility. The aim was to explore whether native and high proficiency non-native listeners benefit similarly from native and non-native clear speech adjustments. The sentence-in-noise perception results revealed that fluent non-native listeners gained a large clear speech benefit from native clear speech modifications. Furthermore, proficient non-native talkers in this study implemented conversational-to-clear speaking style modifications in their second language (L2) that resulted in significant intelligibility gain for both native and non-native listeners. The results of the accentedness ratings obtained for native and non-native conversational and clear speech sentences showed that while intelligibility was improved, the presence of foreign accent remained constant in both speaking styles. This suggests that objective intelligibility and subjective accentedness are two independent dimensions of non-native speech. Overall, these results provide strong evidence that greater experience in L2 processing leads to improved intelligibility in both production and perception domains. These results also demonstrated that speaking style adaptations along with less signal distortion can contribute significantly towards successful native and non-native interactions. PMID:22225056
Hogenelst, Koen; Sarampalis, Anastasios; Leander, N Pontus; Müller, Barbara C N; Schoevers, Robert A; aan het Rot, Marije
2016-03-01
Major depressive disorder (MDD) has been associated with abnormalities in speech and behavioural mimicry. These abnormalities may contribute to the impairments in interpersonal functioning that are often seen in MDD patients. MDD has also been associated with disturbances in the brain serotonin system, but the extent to which serotonin regulates speech and behavioural mimicry remains unclear. In a randomized, double-blind, crossover study, we induced acute tryptophan depletion (ATD) in individuals with or without a family history of MDD. Five hours afterwards, participants engaged in two behavioural-mimicry experiments in which speech and behaviour were recorded. ATD reduced the time participants waited before speaking, which might indicate increased impulsivity. However, ATD did not significantly alter speech otherwise, nor did it affect mimicry. This suggests that a brief lowering of brain serotonin has limited effects on verbal and non-verbal social behaviour. The null findings may be due to low test sensitivity, but they otherwise suggest that low serotonin has little effect on social interaction quality in never-depressed individuals. It remains possible that recovered MDD patients are more strongly affected. © The Author(s) 2016.
Interactive Activation Model of Speech Perception.
1984-11-01
contract. 0 Elar, .l... & .McC’lelland .1.1. Speech perception a, a cognitive proces,: The interactive act ia- %e., tion model of speech perception. In...attempts to provide a machine solution to the problem of speech perception. A second kind of model, growing out of Cognitive Psychology, attempts to...architectures to cognitive and perceptual problems. We also owe a debt to what we might call the computational connectionists -- those who have applied highly
Effect of gap detection threshold on consistency of speech in children with speech sound disorder.
Sayyahi, Fateme; Soleymani, Zahra; Akbari, Mohammad; Bijankhan, Mahmood; Dolatshahi, Behrooz
2017-02-01
The present study examined the relationship between gap detection threshold and speech error consistency in children with speech sound disorder. The participants were children five to six years of age who were categorized into three groups of typical speech, consistent speech disorder (CSD) and inconsistent speech disorder (ISD).The phonetic gap detection threshold test was used for this study, which is a valid test comprised six syllables with inter-stimulus intervals between 20-300ms. The participants were asked to listen to the recorded stimuli three times and indicate whether they heard one or two sounds. There was no significant difference between the typical and CSD groups (p=0.55), but there were significant differences in performance between the ISD and CSD groups and the ISD and typical groups (p=0.00). The ISD group discriminated between speech sounds at a higher threshold. Children with inconsistent speech errors could not distinguish speech sounds during time-limited phonetic discrimination. It is suggested that inconsistency in speech is a representation of inconsistency in auditory perception, which causes by high gap detection threshold. Copyright © 2016 Elsevier Ltd. All rights reserved.
Duvekot, Jorieke; Schalk, Rozemarijn D. F.; Tuinenburg, Eveline M.; Westenberg, P. Michiel
2009-01-01
Social anxiety in adolescents has frequently been linked to negative outcomes from social interactions. The present study investigated whether socially anxious adolescents are treated negatively by their classmates and which characteristics of socially anxious adolescents could explain negative social responses. Classroom observations of class behavior were made during oral presentations of 94 students (60% females) in the ages of 13–18 years. Speakers’ social performance, speech quality, and nervousness during the presentation were also rated. Findings showed that the social performance of socially anxious students was a predictor of class behavior, whereas their overt nervousness was not. Surprisingly, the quality of their speech was negatively related to class behavior. Implications of these findings for the treatment of socially anxious adolescents are discussed. PMID:19842023
Speech Perception as a Cognitive Process: The Interactive Activation Model.
ERIC Educational Resources Information Center
Elman, Jeffrey L.; McClelland, James L.
Research efforts to model speech perception in terms of a processing system in which knowledge and processing are distributed over large numbers of highly interactive--but computationally primative--elements are described in this report. After discussing the properties of speech that demand a parallel interactive processing system, the report…
Resource Room for the Speech Handicapped. (School Year 1974-1975).
ERIC Educational Resources Information Center
Silverman-Dresner, Toby
Thirty-two junior high school students with severe communication defects were provided with speech therapy--which included videotape feedback techniques, phonic mirror, tape recorder, "s" meter, pitch meter, language master, bicom, and other sensory aids--in the Speech and Language Resource Room (Queens, New York). Evaluation procedures included…
Davidow, Jason H; Bothe, Anne K; Ye, Jun
2011-06-01
The most common way to induce fluency using rhythm requires persons who stutter to speak one syllable or one word to each beat of a metronome, but stuttering can also be eliminated when the stimulus is of a particular duration (e.g., 1 second [s]). The present study examined stuttering frequency, speech production changes, and speech naturalness during rhythmic speech that alternated 1s of reading with 1s of silence. A repeated-measures design was used to compare data obtained during a control reading condition and during rhythmic reading in 10 persons who stutter (PWS) and 10 normally fluent controls. Ratings for speech naturalness were also gathered from naïve listeners. Results showed that mean vowel duration increased significantly, and the percentage of short phonated intervals decreased significantly, for both groups from the control to the experimental condition. Mean phonated interval length increased significantly for the fluent controls. Mean speech naturalness ratings during the experimental condition were approximately "7" on a 1-9 scale (1=highly natural; 9=highly unnatural), and these ratings were significantly correlated with vowel duration and phonated intervals for PWS. The findings indicate that PWS may be altering vocal fold vibration duration to obtain fluency during this rhythmic speech style, and that vocal fold vibration duration may have an impact on speech naturalness during rhythmic speech. Future investigations should examine speech production changes and speech naturalness during variations of this rhythmic condition. The reader will be able to: (1) describe changes (from a control reading condition) in speech production variables when alternating between 1s of reading and 1s of silence, (2) describe which rhythmic conditions have been found to sound and feel the most natural, (3) describe methodological issues for studies about alterations in speech production variables during fluency-inducing conditions, and (4) describe which fluency-inducing conditions have been shown to involve a reduction in short phonated intervals. Copyright © 2011 Elsevier Inc. All rights reserved.
Härkönen, Kati; Kivekäs, Ilkka; Rautiainen, Markus; Kotti, Voitto; Sivonen, Ville; Vasama, Juha-Pekka
2015-05-01
This prospective study shows that working performance, quality of life (QoL), and quality of hearing (QoH) are better with two compared with a single cochlear implant (CI). The impact of the second CI on the patient's QoL is as significant as the impact of the first CI. To evaluate the benefits of sequential bilateral cochlear implantation in working, QoL, and QoH. We studied working performance, work-related stress, QoL, and QoH with specific questionnaires in 15 patients with unilateral CI scheduled for sequential CI of another ear. Sound localization performance and speech perception in noise were measured with specific tests. All questionnaires and tests were performed before the second CI surgery and 6 and 12 months after its activation. Bilateral CIs increased patients' working performance and their work-related stress and fatigue decreased. Communication with co-workers was easier and patients were more active in their working environment. Sequential bilateral cochlear implantation improved QoL, QoH, sound localization, and speech perception in noise statistically significantly.
Jónsdottir, Valdis; Laukkanen, Anne-Maria; Siikki, Ilona
2003-01-01
The present study investigated changes in the voice quality of teachers during a working day (a). in ordinary conditions and (b). when using electrical sound amplification while teaching. Classroom speech of 5 teachers was recorded with a portable DAT recorder and a head-mounted microphone during the first and the last lesson of a hard working day first in ordinary conditions and the following week using amplification. Long-term average spectrum and sound pressure level (SPL) analyses were made. The subjects' comments were gathered by questionnaire. Voice quality was evaluated by 2 speech trainers. With amplification, SPL was lower and the spectrum more tilted. Voice quality was evaluated to be better. The subjects reported less fatigue in the vocal mechanism. Spectral tilt decreased and SPL increased during the day. There was a tendency for perceived asthenia to decrease. No significant changes were observed in ordinary conditions. The acoustic changes seem to reflect a positive adaptation to vocal loading. Their absence may be a sign of vocal fatigue. Copyright 2003 S. Karger AG, Basel
Choi, Ja Young; Hu, Elly R; Perrachione, Tyler K
2018-04-01
The nondeterministic relationship between speech acoustics and abstract phonemic representations imposes a challenge for listeners to maintain perceptual constancy despite the highly variable acoustic realization of speech. Talker normalization facilitates speech processing by reducing the degrees of freedom for mapping between encountered speech and phonemic representations. While this process has been proposed to facilitate the perception of ambiguous speech sounds, it is currently unknown whether talker normalization is affected by the degree of potential ambiguity in acoustic-phonemic mapping. We explored the effects of talker normalization on speech processing in a series of speeded classification paradigms, parametrically manipulating the potential for inconsistent acoustic-phonemic relationships across talkers for both consonants and vowels. Listeners identified words with varying potential acoustic-phonemic ambiguity across talkers (e.g., beet/boat vs. boot/boat) spoken by single or mixed talkers. Auditory categorization of words was always slower when listening to mixed talkers compared to a single talker, even when there was no potential acoustic ambiguity between target sounds. Moreover, the processing cost imposed by mixed talkers was greatest when words had the most potential acoustic-phonemic overlap across talkers. Models of acoustic dissimilarity between target speech sounds did not account for the pattern of results. These results suggest (a) that talker normalization incurs the greatest processing cost when disambiguating highly confusable sounds and (b) that talker normalization appears to be an obligatory component of speech perception, taking place even when the acoustic-phonemic relationships across sounds are unambiguous.
Zhou, Hong; Li, Yu; Liang, Meng; Guan, Connie Qun; Zhang, Linjun; Shu, Hua; Zhang, Yang
2017-01-01
The goal of this developmental speech perception study was to assess whether and how age group modulated the influences of high-level semantic context and low-level fundamental frequency ( F 0 ) contours on the recognition of Mandarin speech by elementary and middle-school-aged children in quiet and interference backgrounds. The results revealed different patterns for semantic and F 0 information. One the one hand, age group modulated significantly the use of F 0 contours, indicating that elementary school children relied more on natural F 0 contours than middle school children during Mandarin speech recognition. On the other hand, there was no significant modulation effect of age group on semantic context, indicating that children of both age groups used semantic context to assist speech recognition to a similar extent. Furthermore, the significant modulation effect of age group on the interaction between F 0 contours and semantic context revealed that younger children could not make better use of semantic context in recognizing speech with flat F 0 contours compared with natural F 0 contours, while older children could benefit from semantic context even when natural F 0 contours were altered, thus confirming the important role of F 0 contours in Mandarin speech recognition by elementary school children. The developmental changes in the effects of high-level semantic and low-level F 0 information on speech recognition might reflect the differences in auditory and cognitive resources associated with processing of the two types of information in speech perception.
One approach to design of speech emotion database
NASA Astrophysics Data System (ADS)
Uhrin, Dominik; Chmelikova, Zdenka; Tovarek, Jaromir; Partila, Pavol; Voznak, Miroslav
2016-05-01
This article describes a system for evaluating the credibility of recordings with emotional character. Sound recordings form Czech language database for training and testing systems of speech emotion recognition. These systems are designed to detect human emotions in his voice. The emotional state of man is useful in the security forces and emergency call service. Man in action (soldier, police officer and firefighter) is often exposed to stress. Information about the emotional state (his voice) will help to dispatch to adapt control commands for procedure intervention. Call agents of emergency call service must recognize the mental state of the caller to adjust the mood of the conversation. In this case, the evaluation of the psychological state is the key factor for successful intervention. A quality database of sound recordings is essential for the creation of the mentioned systems. There are quality databases such as Berlin Database of Emotional Speech or Humaine. The actors have created these databases in an audio studio. It means that the recordings contain simulated emotions, not real. Our research aims at creating a database of the Czech emotional recordings of real human speech. Collecting sound samples to the database is only one of the tasks. Another one, no less important, is to evaluate the significance of recordings from the perspective of emotional states. The design of a methodology for evaluating emotional recordings credibility is described in this article. The results describe the advantages and applicability of the developed method.
Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging.
Hagedorn, Christina; Proctor, Michael; Goldstein, Louis; Wilson, Stephen M; Miller, Bruce; Gorno-Tempini, Maria Luisa; Narayanan, Shrikanth S
2017-04-14
Real-time magnetic resonance imaging (MRI) and accompanying analytical methods are shown to capture and quantify salient aspects of apraxic speech, substantiating and expanding upon evidence provided by clinical observation and acoustic and kinematic data. Analysis of apraxic speech errors within a dynamic systems framework is provided and the nature of pathomechanisms of apraxic speech discussed. One adult male speaker with apraxia of speech was imaged using real-time MRI while producing spontaneous speech, repeated naming tasks, and self-paced repetition of word pairs designed to elicit speech errors. Articulatory data were analyzed, and speech errors were detected using time series reflecting articulatory activity in regions of interest. Real-time MRI captured two types of apraxic gestural intrusion errors in a word pair repetition task. Gestural intrusion errors in nonrepetitive speech, multiple silent initiation gestures at the onset of speech, and covert (unphonated) articulation of entire monosyllabic words were also captured. Real-time MRI and accompanying analytical methods capture and quantify many features of apraxic speech that have been previously observed using other modalities while offering high spatial resolution. This patient's apraxia of speech affected the ability to select only the appropriate vocal tract gestures for a target utterance, suppressing others, and to coordinate them in time.
Curtin, Suzanne; Vouloumanos, Athena
2013-09-01
We examined whether infants' preference for speech at 12 months is associated with autistic-like behaviors at 18 months in infants who are at increased risk for autism spectrum disorder (ASD) because they have an older sibling diagnosed with ASD and in low-risk infants. Only low-risk infants listened significantly longer to speech than to nonspeech at 12 months. In both groups, relative preference for speech correlated positively with general cognitive ability at 12 months. However, in high-risk infants only, preference for speech was associated with autistic-like behavior at 18 months, while in low-risk infants, preference for speech correlated with language abilities. This suggests that in children at risk for ASD an atypical species-specific bias for speech may underlie atypical social development.