NASA Astrophysics Data System (ADS)
Athaudage, Chandranath R. N.; Bradley, Alan B.; Lech, Margaret
2003-12-01
A dynamic programming-based optimization strategy for a temporal decomposition (TD) model of speech and its application to low-rate speech coding in storage and broadcasting is presented. In previous work with the spectral stability-based event localizing (SBEL) TD algorithm, the event localization was performed based on a spectral stability criterion. Although this approach gave reasonably good results, there was no assurance on the optimality of the event locations. In the present work, we have optimized the event localizing task using a dynamic programming-based optimization strategy. Simulation results show that an improved TD model accuracy can be achieved. A methodology of incorporating the optimized TD algorithm within the standard MELP speech coder for the efficient compression of speech spectral information is also presented. The performance evaluation results revealed that the proposed speech coding scheme achieves 50%-60% compression of speech spectral information with negligible degradation in the decoded speech quality.
Pulse Vector-Excitation Speech Encoder
NASA Technical Reports Server (NTRS)
Davidson, Grant; Gersho, Allen
1989-01-01
Proposed pulse vector-excitation speech encoder (PVXC) encodes analog speech signals into digital representation for transmission or storage at rates below 5 kilobits per second. Produces high quality of reconstructed speech, but with less computation than required by comparable speech-encoding systems. Has some characteristics of multipulse linear predictive coding (MPLPC) and of code-excited linear prediction (CELP). System uses mathematical model of vocal tract in conjunction with set of excitation vectors and perceptually-based error criterion to synthesize natural-sounding speech.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ravishankar, C., Hughes Network Systems, Germantown, MD
Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfullymore » regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the coding techniques are equally applicable to any voice signal whether or not it carries any intelligible information, as the term speech implies. Other terms that are commonly used are speech compression and voice compression since the fundamental idea behind speech coding is to reduce (compress) the transmission rate (or equivalently the bandwidth) And/or reduce storage requirements In this document the terms speech and voice shall be used interchangeably.« less
Impact of dynamic rate coding aspects of mobile phone networks on forensic voice comparison.
Alzqhoul, Esam A S; Nair, Balamurali B T; Guillemin, Bernard J
2015-09-01
Previous studies have shown that landline and mobile phone networks are different in their ways of handling the speech signal, and therefore in their impact on it. But the same is also true of the different networks within the mobile phone arena. There are two major mobile phone technologies currently in use today, namely the global system for mobile communications (GSM) and code division multiple access (CDMA) and these are fundamentally different in their design. For example, the quality of the coded speech in the GSM network is a function of channel quality, whereas in the CDMA network it is determined by channel capacity (i.e., the number of users sharing a cell site). This paper examines the impact on the speech signal of a key feature of these networks, namely dynamic rate coding, and its subsequent impact on the task of likelihood-ratio-based forensic voice comparison (FVC). Surprisingly, both FVC accuracy and precision are found to be better for both GSM- and CDMA-coded speech than for uncoded. Intuitively one expects FVC accuracy to increase with increasing coded speech quality. This trend is shown to occur for the CDMA network, but, surprisingly, not for the GSM network. Further, in respect to comparisons between these two networks, FVC accuracy for CDMA-coded speech is shown to be slightly better than for GSM-coded speech, particularly when the coded-speech quality is high, but in terms of FVC precision the two networks are shown to be very similar. Copyright © 2015 The Chartered Society of Forensic Sciences. Published by Elsevier Ireland Ltd. All rights reserved.
Yoo, Sejin; Chung, Jun-Young; Jeon, Hyeon-Ae; Lee, Kyoung-Min; Kim, Young-Bo; Cho, Zang-Hee
2012-07-01
Speech production is inextricably linked to speech perception, yet they are usually investigated in isolation. In this study, we employed a verbal-repetition task to identify the neural substrates of speech processing with two ends active simultaneously using functional MRI. Subjects verbally repeated auditory stimuli containing an ambiguous vowel sound that could be perceived as either a word or a pseudoword depending on the interpretation of the vowel. We found verbal repetition commonly activated the audition-articulation interface bilaterally at Sylvian fissures and superior temporal sulci. Contrasting word-versus-pseudoword trials revealed neural activities unique to word repetition in the left posterior middle temporal areas and activities unique to pseudoword repetition in the left inferior frontal gyrus. These findings imply that the tasks are carried out using different speech codes: an articulation-based code of pseudowords and an acoustic-phonetic code of words. It also supports the dual-stream model and imitative learning of vocabulary. Copyright © 2012 Elsevier Inc. All rights reserved.
Fingerspelled and Printed Words Are Recoded into a Speech-based Code in Short-term Memory.
Sehyr, Zed Sevcikova; Petrich, Jennifer; Emmorey, Karen
2017-01-01
We conducted three immediate serial recall experiments that manipulated type of stimulus presentation (printed or fingerspelled words) and word similarity (speech-based or manual). Matched deaf American Sign Language signers and hearing non-signers participated (mean reading age = 14-15 years). Speech-based similarity effects were found for both stimulus types indicating that deaf signers recoded both printed and fingerspelled words into a speech-based phonological code. A manual similarity effect was not observed for printed words indicating that print was not recoded into fingerspelling (FS). A manual similarity effect was observed for fingerspelled words when similarity was based on joint angles rather than on handshape compactness. However, a follow-up experiment suggested that the manual similarity effect was due to perceptual confusion at encoding. Overall, these findings suggest that FS is strongly linked to English phonology for deaf adult signers who are relatively skilled readers. This link between fingerspelled words and English phonology allows for the use of a more efficient speech-based code for retaining fingerspelled words in short-term memory and may strengthen the representation of English vocabulary. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Improved Speech Coding Based on Open-Loop Parameter Estimation
NASA Technical Reports Server (NTRS)
Juang, Jer-Nan; Chen, Ya-Chin; Longman, Richard W.
2000-01-01
A nonlinear optimization algorithm for linear predictive speech coding was developed early that not only optimizes the linear model coefficients for the open loop predictor, but does the optimization including the effects of quantization of the transmitted residual. It also simultaneously optimizes the quantization levels used for each speech segment. In this paper, we present an improved method for initialization of this nonlinear algorithm, and demonstrate substantial improvements in performance. In addition, the new procedure produces monotonically improving speech quality with increasing numbers of bits used in the transmitted error residual. Examples of speech encoding and decoding are given for 8 speech segments and signal to noise levels as high as 47 dB are produced. As in typical linear predictive coding, the optimization is done on the open loop speech analysis model. Here we demonstrate that minimizing the error of the closed loop speech reconstruction, instead of the simpler open loop optimization, is likely to produce negligible improvement in speech quality. The examples suggest that the algorithm here is close to giving the best performance obtainable from a linear model, for the chosen order with the chosen number of bits for the codebook.
NASA Technical Reports Server (NTRS)
Mcaulay, Robert J.; Quatieri, Thomas F.
1988-01-01
It has been shown that an analysis/synthesis system based on a sinusoidal representation of speech leads to synthetic speech that is essentially perceptually indistinguishable from the original. Strategies for coding the amplitudes, frequencies and phases of the sine waves have been developed that have led to a multirate coder operating at rates from 2400 to 9600 bps. The encoded speech is highly intelligible at all rates with a uniformly improving quality as the data rate is increased. A real-time fixed-point implementation has been developed using two ADSP2100 DSP chips. The methods used for coding and quantizing the sine-wave parameters for operation at the various frame rates are described.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves
Holzrichter, J.F.; Ng, L.C.
1998-03-17
The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves
Holzrichter, John F.; Ng, Lawrence C.
1998-01-01
The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holzrichter, J.F.; Ng, L.C.
The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used formore » purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.« less
Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain
Gross, Joachim; Hoogenboom, Nienke; Thut, Gregor; Schyns, Philippe; Panzeri, Stefano; Belin, Pascal; Garrod, Simon
2013-01-01
Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG) to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta) and the amplitude of high-frequency (gamma) oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex) attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations. PMID:24391472
Real-time speech encoding based on Code-Excited Linear Prediction (CELP)
NASA Technical Reports Server (NTRS)
Leblanc, Wilfrid P.; Mahmoud, S. A.
1988-01-01
This paper reports on the work proceeding with regard to the development of a real-time voice codec for the terrestrial and satellite mobile radio environments. The codec is based on a complexity reduced version of code-excited linear prediction (CELP). The codebook search complexity was reduced to only 0.5 million floating point operations per second (MFLOPS) while maintaining excellent speech quality. Novel methods to quantize the residual and the long and short term model filters are presented.
ERIC Educational Resources Information Center
Hickok, Gregory
2012-01-01
Speech recognition is an active process that involves some form of predictive coding. This statement is relatively uncontroversial. What is less clear is the source of the prediction. The dual-stream model of speech processing suggests that there are two possible sources of predictive coding in speech perception: the motor speech system and the…
Speech processing using conditional observable maximum likelihood continuity mapping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, John; Nix, David
A computer implemented method enables the recognition of speech and speech characteristics. Parameters are initialized of first probability density functions that map between the symbols in the vocabulary of one or more sequences of speech codes that represent speech sounds and a continuity map. Parameters are also initialized of second probability density functions that map between the elements in the vocabulary of one or more desired sequences of speech transcription symbols and the continuity map. The parameters of the probability density functions are then trained to maximize the probabilities of the desired sequences of speech-transcription symbols. A new sequence ofmore » speech codes is then input to the continuity map having the trained first and second probability function parameters. A smooth path is identified on the continuity map that has the maximum probability for the new sequence of speech codes. The probability of each speech transcription symbol for each input speech code can then be output.« less
NASA Astrophysics Data System (ADS)
Jiang, Hongyan; Qiu, Hongbing; He, Ning; Liao, Xin
2018-06-01
For the optoacoustic communication from in-air platforms to submerged apparatus, a method based on speech recognition and variable laser-pulse repetition rates is proposed, which realizes character encoding and transmission for speech. Firstly, the theories and spectrum characteristics of the laser-generated underwater sound are analyzed; and moreover character conversion and encoding for speech as well as the pattern of codes for laser modulation is studied; lastly experiments to verify the system design are carried out. Results show that the optoacoustic system, where laser modulation is controlled by speech-to-character baseband codes, is beneficial to improve flexibility in receiving location for underwater targets as well as real-time performance in information transmission. In the overwater transmitter, a pulse laser is controlled to radiate by speech signals with several repetition rates randomly selected in the range of one to fifty Hz, and then in the underwater receiver laser pulse repetition rate and data can be acquired by the preamble and information codes of the corresponding laser-generated sound. When the energy of the laser pulse is appropriate, real-time transmission for speaker-independent speech can be realized in that way, which solves the problem of underwater bandwidth resource and provides a technical approach for the air-sea communication.
Vector Adaptive/Predictive Encoding Of Speech
NASA Technical Reports Server (NTRS)
Chen, Juin-Hwey; Gersho, Allen
1989-01-01
Vector adaptive/predictive technique for digital encoding of speech signals yields decoded speech of very good quality after transmission at coding rate of 9.6 kb/s and of reasonably good quality at 4.8 kb/s. Requires 3 to 4 million multiplications and additions per second. Combines advantages of adaptive/predictive coding, and code-excited linear prediction, yielding speech of high quality but requires 600 million multiplications and additions per second at encoding rate of 4.8 kb/s. Vector adaptive/predictive coding technique bridges gaps in performance and complexity between adaptive/predictive coding and code-excited linear prediction.
Hateful Help--A Practical Look at the Issue of Hate Speech.
ERIC Educational Resources Information Center
Shelton, Michael W.
Many college and university administrators have responded to the recent increase in hateful incidents on campus by putting hate speech codes into place. The establishment of speech codes has sparked a heated debate over the impact that such codes have upon free speech and First Amendment values. Some commentators have suggested that viewing hate…
Transitioning from analog to digital audio recording in childhood speech sound disorders.
Shriberg, Lawrence D; McSweeny, Jane L; Anderson, Bruce E; Campbell, Thomas F; Chial, Michael R; Green, Jordan R; Hauner, Katherina K; Moore, Christopher A; Rusiewicz, Heather L; Wilson, David L
2005-06-01
Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants' speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise.
Transitioning from analog to digital audio recording in childhood speech sound disorders
Shriberg, Lawrence D.; McSweeny, Jane L.; Anderson, Bruce E.; Campbell, Thomas F.; Chial, Michael R.; Green, Jordan R.; Hauner, Katherina K.; Moore, Christopher A.; Rusiewicz, Heather L.; Wilson, David L.
2014-01-01
Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants’ speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise. PMID:16019779
The design of an adaptive predictive coder using a single-chip digital signal processor
NASA Astrophysics Data System (ADS)
Randolph, M. A.
1985-01-01
A speech coding processor architecture design study has been performed in which Texas Instruments TMS32010 has been selected from among three commercially available digital signal processing integrated circuits and evaluated in an implementation study of real-time Adaptive Predictive Coding (APC). The TMS32010 has been compared with AR&T Bell Laboratories DSP I and Nippon Electric Co. PD7720 and was found to be most suitable for a single chip implementation of APC. A preliminary design system based on TMS32010 has been performed, and several of the hardware and software design issues are discussed. Particular attention was paid to the design of an external memory controller which permits rapid sequential access of external RAM. As a result, it has been determined that a compact hardware implementation of the APC algorithm is feasible based of the TSM32010. Originator-supplied keywords include: vocoders, speech compression, adaptive predictive coding, digital signal processing microcomputers, speech processor architectures, and special purpose processor.
Application of a VLSI vector quantization processor to real-time speech coding
NASA Technical Reports Server (NTRS)
Davidson, G.; Gersho, A.
1986-01-01
Attention is given to a working vector quantization processor for speech coding that is based on a first-generation VLSI chip which efficiently performs the pattern-matching operation needed for the codebook search process (CPS). Using this chip, the CPS architecture has been successfully incorporated into a compact, single-board Vector PCM implementation operating at 7-18 kbits/sec. A real time Adaptive Vector Predictive Coder system using the CPS has also been implemented.
A Comparison of LBG and ADPCM Speech Compression Techniques
NASA Astrophysics Data System (ADS)
Bachu, Rajesh G.; Patel, Jignasa; Barkana, Buket D.
Speech compression is the technology of converting human speech into an efficiently encoded representation that can later be decoded to produce a close approximation of the original signal. In all speech there is a degree of predictability and speech coding techniques exploit this to reduce bit rates yet still maintain a suitable level of quality. This paper is a study and implementation of Linde-Buzo-Gray Algorithm (LBG) and Adaptive Differential Pulse Code Modulation (ADPCM) algorithms to compress speech signals. In here we implemented the methods using MATLAB 7.0. The methods we used in this study gave good results and performance in compressing the speech and listening tests showed that efficient and high quality coding is achieved.
Spotlight on Speech Codes 2012: The State of Free Speech on Our Nation's Campuses
ERIC Educational Resources Information Center
Foundation for Individual Rights in Education (NJ1), 2012
2012-01-01
The U.S. Supreme Court has called America's colleges and universities "vital centers for the Nation's intellectual life," but the reality today is that many of these institutions severely restrict free speech and open debate. Speech codes--policies prohibiting student and faculty speech that would, outside the bounds of campus, be…
NASA Astrophysics Data System (ADS)
Viswanathan, V. R.; Makhoul, J.; Schwartz, R. M.; Huggins, A. W. F.
1982-04-01
The variable frame rate (VFR) transmission methodology developed, implemented, and tested in the years 1973-1978 for efficiently transmitting linear predictive coding (LPC) vocoder parameters extracted from the input speech at a fixed frame rate is reviewed. With the VFR method, parameters are transmitted only when their values have changed sufficiently over the interval since their preceding transmission. Two distinct approaches to automatic implementation of the VFR method are discussed. The first bases the transmission decisions on comparisons between the parameter values of the present frame and the last transmitted frame. The second, which is based on a functional perceptual model of speech, compares the parameter values of all the frames that lie in the interval between the present frame and the last transmitted frame against a linear model of parameter variation over that interval. Also considered is the application of VFR transmission to the design of narrow-band LPC speech coders with average bit rates of 2000-2400 bts/s.
Tuning time-frequency methods for the detection of metered HF speech
NASA Astrophysics Data System (ADS)
Nelson, Douglas J.; Smith, Lawrence H.
2002-12-01
Speech is metered if the stresses occur at a nearly regular rate. Metered speech is common in poetry, and it can occur naturally in speech, if the speaker is spelling a word or reciting words or numbers from a list. In radio communications, the CQ request, call sign and other codes are frequently metered. In tactical communications and air traffic control, location, heading and identification codes may be metered. Moreover metering may be expected to survive even in HF communications, which are corrupted by noise, interference and mistuning. For this environment, speech recognition and conventional machine-based methods are not effective. We describe Time-Frequency methods which have been adapted successfully to the problem of mitigation of HF signal conditions and detection of metered speech. These methods are based on modeled time and frequency correlation properties of nearly harmonic functions. We derive these properties and demonstrate a performance gain over conventional correlation and spectral methods. Finally, in addressing the problem of HF single sideband (SSB) communications, the problems of carrier mistuning, interfering signals, such as manual Morse, and fast automatic gain control (AGC) must be addressed. We demonstrate simple methods which may be used to blindly mitigate mistuning and narrowband interference, and effectively invert the fast automatic gain function.
Coutinho, Eduardo; Schuller, Björn
2017-01-01
Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies-the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain.
Techniques for the Enhancement of Linear Predictive Speech Coding in Adverse Conditions
NASA Astrophysics Data System (ADS)
Wrench, Alan A.
Available from UMI in association with The British Library. Requires signed TDF. The Linear Prediction model was first applied to speech two and a half decades ago. Since then it has been the subject of intense research and continues to be one of the principal tools in the analysis of speech. Its mathematical tractability makes it a suitable subject for study and its proven success in practical applications makes the study worthwhile. The model is known to be unsuited to speech corrupted by background noise. This has led many researchers to investigate ways of enhancing the speech signal prior to Linear Predictive analysis. In this thesis this body of work is extended. The chosen application is low bit-rate (2.4 kbits/sec) speech coding. For this task the performance of the Linear Prediction algorithm is crucial because there is insufficient bandwidth to encode the error between the modelled speech and the original input. A review of the fundamentals of Linear Prediction and an independent assessment of the relative performance of methods of Linear Prediction modelling are presented. A new method is proposed which is fast and facilitates stability checking, however, its stability is shown to be unacceptably poorer than existing methods. A novel supposition governing the positioning of the analysis frame relative to a voiced speech signal is proposed and supported by observation. The problem of coding noisy speech is examined. Four frequency domain speech processing techniques are developed and tested. These are: (i) Combined Order Linear Prediction Spectral Estimation; (ii) Frequency Scaling According to an Aural Model; (iii) Amplitude Weighting Based on Perceived Loudness; (iv) Power Spectrum Squaring. These methods are compared with the Recursive Linearised Maximum a Posteriori method. Following on from work done in the frequency domain, a time domain implementation of spectrum squaring is developed. In addition, a new method of power spectrum estimation is developed based on the Minimum Variance approach. This new algorithm is shown to be closely related to Linear Prediction but produces slightly broader spectral peaks. Spectrum squaring is applied to both the new algorithm and standard Linear Prediction and their relative performance is assessed. (Abstract shortened by UMI.).
Coding strategies for cochlear implants under adverse environments
NASA Astrophysics Data System (ADS)
Tahmina, Qudsia
Cochlear implants are electronic prosthetic devices that restores partial hearing in patients with severe to profound hearing loss. Although most coding strategies have significantly improved the perception of speech in quite listening conditions, there remains limitations on speech perception under adverse environments such as in background noise, reverberation and band-limited channels, and we propose strategies that improve the intelligibility of speech transmitted over the telephone networks, reverberated speech and speech in the presence of background noise. For telephone processed speech, we propose to examine the effects of adding low-frequency and high- frequency information to the band-limited telephone speech. Four listening conditions were designed to simulate the receiving frequency characteristics of telephone handsets. Results indicated improvement in cochlear implant and bimodal listening when telephone speech was augmented with high frequency information and therefore this study provides support for design of algorithms to extend the bandwidth towards higher frequencies. The results also indicated added benefit from hearing aids for bimodal listeners in all four types of listening conditions. Speech understanding in acoustically reverberant environments is always a difficult task for hearing impaired listeners. Reverberated sounds consists of direct sound, early reflections and late reflections. Late reflections are known to be detrimental to speech intelligibility. In this study, we propose a reverberation suppression strategy based on spectral subtraction to suppress the reverberant energies from late reflections. Results from listening tests for two reverberant conditions (RT60 = 0.3s and 1.0s) indicated significant improvement when stimuli was processed with SS strategy. The proposed strategy operates with little to no prior information on the signal and the room characteristics and therefore, can potentially be implemented in real-time CI speech processors. For speech in background noise, we propose a mechanism underlying the contribution of harmonics to the benefit of electroacoustic stimulations in cochlear implants. The proposed strategy is based on harmonic modeling and uses synthesis driven approach to synthesize the harmonics in voiced segments of speech. Based on objective measures, results indicated improvement in speech quality. This study warrants further work into development of algorithms to regenerate harmonics of voiced segments in the presence of noise.
Neural Coding of Formant-Exaggerated Speech in the Infant Brain
ERIC Educational Resources Information Center
Zhang, Yang; Koerner, Tess; Miller, Sharon; Grice-Patil, Zach; Svec, Adam; Akbari, David; Tusler, Liz; Carney, Edward
2011-01-01
Speech scientists have long proposed that formant exaggeration in infant-directed speech plays an important role in language acquisition. This event-related potential (ERP) study investigated neural coding of formant-exaggerated speech in 6-12-month-old infants. Two synthetic /i/ vowels were presented in alternating blocks to test the effects of…
A software tool for analyzing multichannel cochlear implant signals.
Lai, Wai Kong; Bögli, Hans; Dillier, Norbert
2003-10-01
A useful and convenient means to analyze the radio frequency (RF) signals being sent by a speech processor to a cochlear implant would be to actually capture and display them with appropriate software. This is particularly useful for development or diagnostic purposes. sCILab (Swiss Cochlear Implant Laboratory) is such a PC-based software tool intended for the Nucleus family of Multichannel Cochlear Implants. Its graphical user interface provides a convenient and intuitive means for visualizing and analyzing the signals encoding speech information. Both numerical and graphic displays are available for detailed examination of the captured CI signals, as well as an acoustic simulation of these CI signals. sCILab has been used in the design and verification of new speech coding strategies, and has also been applied as an analytical tool in studies of how different parameter settings of existing speech coding strategies affect speech perception. As a diagnostic tool, it is also useful for troubleshooting problems with the external equipment of the cochlear implant systems.
Magnified Neural Envelope Coding Predicts Deficits in Speech Perception in Noise.
Millman, Rebecca E; Mattys, Sven L; Gouws, André D; Prendergast, Garreth
2017-08-09
Verbal communication in noisy backgrounds is challenging. Understanding speech in background noise that fluctuates in intensity over time is particularly difficult for hearing-impaired listeners with a sensorineural hearing loss (SNHL). The reduction in fast-acting cochlear compression associated with SNHL exaggerates the perceived fluctuations in intensity in amplitude-modulated sounds. SNHL-induced changes in the coding of amplitude-modulated sounds may have a detrimental effect on the ability of SNHL listeners to understand speech in the presence of modulated background noise. To date, direct evidence for a link between magnified envelope coding and deficits in speech identification in modulated noise has been absent. Here, magnetoencephalography was used to quantify the effects of SNHL on phase locking to the temporal envelope of modulated noise (envelope coding) in human auditory cortex. Our results show that SNHL enhances the amplitude of envelope coding in posteromedial auditory cortex, whereas it enhances the fidelity of envelope coding in posteromedial and posterolateral auditory cortex. This dissociation was more evident in the right hemisphere, demonstrating functional lateralization in enhanced envelope coding in SNHL listeners. However, enhanced envelope coding was not perceptually beneficial. Our results also show that both hearing thresholds and, to a lesser extent, magnified cortical envelope coding in left posteromedial auditory cortex predict speech identification in modulated background noise. We propose a framework in which magnified envelope coding in posteromedial auditory cortex disrupts the segregation of speech from background noise, leading to deficits in speech perception in modulated background noise. SIGNIFICANCE STATEMENT People with hearing loss struggle to follow conversations in noisy environments. Background noise that fluctuates in intensity over time poses a particular challenge. Using magnetoencephalography, we demonstrate anatomically distinct cortical representations of modulated noise in normal-hearing and hearing-impaired listeners. This work provides the first link among hearing thresholds, the amplitude of cortical representations of modulated sounds, and the ability to understand speech in modulated background noise. In light of previous work, we propose that magnified cortical representations of modulated sounds disrupt the separation of speech from modulated background noise in auditory cortex. Copyright © 2017 Millman et al.
Neural Spike-Train Analyses of the Speech-Based Envelope Power Spectrum Model
Rallapalli, Varsha H.
2016-01-01
Diagnosing and treating hearing impairment is challenging because people with similar degrees of sensorineural hearing loss (SNHL) often have different speech-recognition abilities. The speech-based envelope power spectrum model (sEPSM) has demonstrated that the signal-to-noise ratio (SNRENV) from a modulation filter bank provides a robust speech-intelligibility measure across a wider range of degraded conditions than many long-standing models. In the sEPSM, noise (N) is assumed to: (a) reduce S + N envelope power by filling in dips within clean speech (S) and (b) introduce an envelope noise floor from intrinsic fluctuations in the noise itself. While the promise of SNRENV has been demonstrated for normal-hearing listeners, it has not been thoroughly extended to hearing-impaired listeners because of limited physiological knowledge of how SNHL affects speech-in-noise envelope coding relative to noise alone. Here, envelope coding to speech-in-noise stimuli was quantified from auditory-nerve model spike trains using shuffled correlograms, which were analyzed in the modulation-frequency domain to compute modulation-band estimates of neural SNRENV. Preliminary spike-train analyses show strong similarities to the sEPSM, demonstrating feasibility of neural SNRENV computations. Results suggest that individual differences can occur based on differential degrees of outer- and inner-hair-cell dysfunction in listeners currently diagnosed into the single audiological SNHL category. The predicted acoustic-SNR dependence in individual differences suggests that the SNR-dependent rate of susceptibility could be an important metric in diagnosing individual differences. Future measurements of the neural SNRENV in animal studies with various forms of SNHL will provide valuable insight for understanding individual differences in speech-in-noise intelligibility.
Auditory-neurophysiological responses to speech during early childhood: Effects of background noise
White-Schwoch, Travis; Davies, Evan C.; Thompson, Elaine C.; Carr, Kali Woodruff; Nicol, Trent; Bradlow, Ann R.; Kraus, Nina
2015-01-01
Early childhood is a critical period of auditory learning, during which children are constantly mapping sounds to meaning. But learning rarely occurs under ideal listening conditions—children are forced to listen against a relentless din. This background noise degrades the neural coding of these critical sounds, in turn interfering with auditory learning. Despite the importance of robust and reliable auditory processing during early childhood, little is known about the neurophysiology underlying speech processing in children so young. To better understand the physiological constraints these adverse listening scenarios impose on speech sound coding during early childhood, auditory-neurophysiological responses were elicited to a consonant-vowel syllable in quiet and background noise in a cohort of typically-developing preschoolers (ages 3–5 yr). Overall, responses were degraded in noise: they were smaller, less stable across trials, slower, and there was poorer coding of spectral content and the temporal envelope. These effects were exacerbated in response to the consonant transition relative to the vowel, suggesting that the neural coding of spectrotemporally-dynamic speech features is more tenuous in noise than the coding of static features—even in children this young. Neural coding of speech temporal fine structure, however, was more resilient to the addition of background noise than coding of temporal envelope information. Taken together, these results demonstrate that noise places a neurophysiological constraint on speech processing during early childhood by causing a breakdown in neural processing of speech acoustics. These results may explain why some listeners have inordinate difficulties understanding speech in noise. Speech-elicited auditory-neurophysiological responses offer objective insight into listening skills during early childhood by reflecting the integrity of neural coding in quiet and noise; this paper documents typical response properties in this age group. These normative metrics may be useful clinically to evaluate auditory processing difficulties during early childhood. PMID:26113025
Multipath search coding of stationary signals with applications to speech
NASA Astrophysics Data System (ADS)
Fehn, H. G.; Noll, P.
1982-04-01
This paper deals with the application of multipath search coding (MSC) concepts to the coding of stationary memoryless and correlated sources, and of speech signals, at a rate of one bit per sample. Use is made of three MSC classes: (1) codebook coding, or vector quantization, (2) tree coding, and (3) trellis coding. This paper explains the performances of these coders and compares them both with those of conventional coders and with rate-distortion bounds. The potentials of MSC coding strategies are demonstrated by illustrations. The paper reports also on results of MSC coding of speech, where both the strategy of adaptive quantization and of adaptive prediction were included in coder design.
Transitioning from Analog to Digital Audio Recording in Childhood Speech Sound Disorders
ERIC Educational Resources Information Center
Shriberg, Lawrence D.; Mcsweeny, Jane L.; Anderson, Bruce E.; Campbell, Thomas F.; Chial, Michael R.; Green, Jordan R.; Hauner, Katherina K.; Moore, Christopher A.; Rusiewicz, Heather L.; Wilson, David L.
2005-01-01
Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing…
Auditory Speech Perception Tests in Relation to the Coding Strategy in Cochlear Implant.
Bazon, Aline Cristine; Mantello, Erika Barioni; Gonçales, Alina Sanches; Isaac, Myriam de Lima; Hyppolito, Miguel Angelo; Reis, Ana Cláudia Mirândola Barbosa
2016-07-01
The objective of the evaluation of auditory perception of cochlear implant users is to determine how the acoustic signal is processed, leading to the recognition and understanding of sound. To investigate the differences in the process of auditory speech perception in individuals with postlingual hearing loss wearing a cochlear implant, using two different speech coding strategies, and to analyze speech perception and handicap perception in relation to the strategy used. This study is prospective cross-sectional cohort study of a descriptive character. We selected ten cochlear implant users that were characterized by hearing threshold by the application of speech perception tests and of the Hearing Handicap Inventory for Adults. There was no significant difference when comparing the variables subject age, age at acquisition of hearing loss, etiology, time of hearing deprivation, time of cochlear implant use and mean hearing threshold with the cochlear implant with the shift in speech coding strategy. There was no relationship between lack of handicap perception and improvement in speech perception in both speech coding strategies used. There was no significant difference between the strategies evaluated and no relation was observed between them and the variables studied.
Signal Prediction With Input Identification
NASA Technical Reports Server (NTRS)
Juang, Jer-Nan; Chen, Ya-Chin
1999-01-01
A novel coding technique is presented for signal prediction with applications including speech coding, system identification, and estimation of input excitation. The approach is based on the blind equalization method for speech signal processing in conjunction with the geometric subspace projection theory to formulate the basic prediction equation. The speech-coding problem is often divided into two parts, a linear prediction model and excitation input. The parameter coefficients of the linear predictor and the input excitation are solved simultaneously and recursively by a conventional recursive least-squares algorithm. The excitation input is computed by coding all possible outcomes into a binary codebook. The coefficients of the linear predictor and excitation, and the index of the codebook can then be used to represent the signal. In addition, a variable-frame concept is proposed to block the same excitation signal in sequence in order to reduce the storage size and increase the transmission rate. The results of this work can be easily extended to the problem of disturbance identification. The basic principles are outlined in this report and differences from other existing methods are discussed. Simulations are included to demonstrate the proposed method.
Schuller, Björn
2017-01-01
Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies—the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain. PMID:28658285
Ultra-narrow bandwidth voice coding
Holzrichter, John F [Berkeley, CA; Ng, Lawrence C [Danville, CA
2007-01-09
A system of removing excess information from a human speech signal and coding the remaining signal information, transmitting the coded signal, and reconstructing the coded signal. The system uses one or more EM wave sensors and one or more acoustic microphones to determine at least one characteristic of the human speech signal.
A variable rate speech compressor for mobile applications
NASA Technical Reports Server (NTRS)
Yeldener, S.; Kondoz, A. M.; Evans, B. G.
1990-01-01
One of the most promising speech coder at the bit rate of 9.6 to 4.8 kbits/s is CELP. Code Excited Linear Prediction (CELP) has been dominating 9.6 to 4.8 kbits/s region during the past 3 to 4 years. Its set back however, is its expensive implementation. As an alternative to CELP, the Base-Band CELP (CELP-BB) was developed which produced good quality speech comparable to CELP and a single chip implementable complexity as reported previously. Its robustness was also improved to tolerate errors up to 1.0 pct. and maintain intelligibility up to 5.0 pct. and more. Although, CELP-BB produces good quality speech at around 4.8 kbits/s, it has a fundamental problem when updating the pitch filter memory. A sub-optimal solution is proposed for this problem. Below 4.8 kbits/s, however, CELP-BB suffers from noticeable quantization noise as a result of the large vector dimensions used. Efficient representation of speech below 4.8 kbits/s is reported by introducing Sinusoidal Transform Coding (STC) to represent the LPC excitation which is called Sine Wave Excited LPC (SWELP). In this case, natural sounding good quality synthetic speech is obtained at around 2.4 kbits/s.
Language Recognition via Sparse Coding
2016-09-08
a posteriori (MAP) adaptation scheme that further optimizes the discriminative quality of sparse-coded speech fea - tures. We empirically validate the...significantly improve the discriminative quality of sparse-coded speech fea - tures. In Section 4, we evaluate the proposed approaches against an i-vector
Enhancing speech recognition using improved particle swarm optimization based hidden Markov model.
Selvaraj, Lokesh; Ganesan, Balakrishnan
2014-01-01
Enhancing speech recognition is the primary intention of this work. In this paper a novel speech recognition method based on vector quantization and improved particle swarm optimization (IPSO) is suggested. The suggested methodology contains four stages, namely, (i) denoising, (ii) feature mining (iii), vector quantization, and (iv) IPSO based hidden Markov model (HMM) technique (IP-HMM). At first, the speech signals are denoised using median filter. Next, characteristics such as peak, pitch spectrum, Mel frequency Cepstral coefficients (MFCC), mean, standard deviation, and minimum and maximum of the signal are extorted from the denoised signal. Following that, to accomplish the training process, the extracted characteristics are given to genetic algorithm based codebook generation in vector quantization. The initial populations are created by selecting random code vectors from the training set for the codebooks for the genetic algorithm process and IP-HMM helps in doing the recognition. At this point the creativeness will be done in terms of one of the genetic operation crossovers. The proposed speech recognition technique offers 97.14% accuracy.
NASA Technical Reports Server (NTRS)
Sandor, Aniko; Moses, Haifa
2016-01-01
Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.
The development of the Nucleus Freedom Cochlear implant system.
Patrick, James F; Busby, Peter A; Gibson, Peter J
2006-12-01
Cochlear Limited (Cochlear) released the fourth-generation cochlear implant system, Nucleus Freedom, in 2005. Freedom is based on 25 years of experience in cochlear implant research and development and incorporates advances in medicine, implantable materials, electronic technology, and sound coding. This article presents the development of Cochlear's implant systems, with an overview of the first 3 generations, and details of the Freedom system: the CI24RE receiver-stimulator, the Contour Advance electrode, the modular Freedom processor, the available speech coding strategies, the input processing options of Smart Sound to improve the signal before coding as electrical signals, and the programming software. Preliminary results from multicenter studies with the Freedom system are reported, demonstrating better levels of performance compared with the previous systems. The final section presents the most recent implant reliability data, with the early findings at 18 months showing improved reliability of the Freedom implant compared with the earlier Nucleus 3 System. Also reported are some of the findings of Cochlear's collaborative research programs to improve recipient outcomes. Included are studies showing the benefits from bilateral implants, electroacoustic stimulation using an ipsilateral and/or contralateral hearing aid, advanced speech coding, and streamlined speech processor programming.
Civility on Campus: Harassment Codes vs. Free Speech. ASHE Annual Meeting Paper.
ERIC Educational Resources Information Center
Nordin, Virginia Davis
In response to the resurgence of racial incidents and increased "gay-bashing" on higher education campuses in recent years, campus authorities have instituted harassment codes thereby giving rise to a conflicts with free speech. Similar conflicts and challenges to free speech have arisen recently in a municipal context such as a St. Paul…
Hansen, J H; Nandkumar, S
1995-01-01
The formulation of reliable signal processing algorithms for speech coding and synthesis require the selection of a prior criterion of performance. Though coding efficiency (bits/second) or computational requirements can be used, a final performance measure must always include speech quality. In this paper, three objective speech quality measures are considered with respect to quality assessment for American English, noisy American English, and noise-free versions of seven languages. The purpose is to determine whether objective quality measures can be used to quantify changes in quality for a given voice coding method, with a known subjective performance level, as background noise or language conditions are changed. The speech coding algorithm chosen is regular-pulse excitation with long-term prediction (RPE-LTP), which has been chosen as the standard voice compression algorithm for the European Digital Mobile Radio system. Three areas are considered for objective quality assessment which include: (i) vocoder performance for American English in a noise-free environment, (ii) speech quality variation for three additive background noise sources, and (iii) noise-free performance for seven languages which include English, Japanese, Finnish, German, Hindi, Spanish, and French. It is suggested that although existing objective quality measures will never replace subjective testing, they can be a useful means of assessing changes in performance, identifying areas for improvement in algorithm design, and augmenting subjective quality tests for voice coding/compression algorithms in noise-free, noisy, and/or non-English applications.
Neural Coding of Relational Invariance in Speech: Human Language Analogs to the Barn Owl.
ERIC Educational Resources Information Center
Sussman, Harvey M.
1989-01-01
The neuronal model shown to code sound-source azimuth in the barn owl by H. Wagner et al. in 1987 is used as the basis for a speculative brain-based human model, which can establish contrastive phonetic categories to solve the problem of perception "non-invariance." (SLD)
Human phoneme recognition depending on speech-intrinsic variability.
Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger
2010-11-01
The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).
ERIC Educational Resources Information Center
Haugh, Erin Kathleen
2017-01-01
The purpose of this study was to examine the role orthographic coding might play in distinguishing between membership in groups of language-based disability types. The sample consisted of 36 second and third-grade subjects who were administered the PAL-II Receptive Coding and Word Choice Accuracy subtest as a measure of orthographic coding…
Everyday listening questionnaire: correlation between subjective hearing and objective performance.
Brendel, Martina; Frohne-Buechner, Carolin; Lesinski-Schiedat, Anke; Lenarz, Thomas; Buechner, Andreas
2014-01-01
Clinical experience has demonstrated that speech understanding by cochlear implant (CI) recipients has improved over recent years with the development of new technology. The Everyday Listening Questionnaire 2 (ELQ 2) was designed to collect information regarding the challenges faced by CI recipients in everyday listening. The aim of this study was to compare self-assessment of CI users using ELQ 2 with objective speech recognition measures and to compare results between users of older and newer coding strategies. During their regular clinical review appointments a group of representative adult CI recipients implanted with the Advanced Bionics implant system were asked to complete the questionnaire. The first 100 patients who agreed to participate in this survey were recruited independent of processor generation and speech coding strategy. Correlations between subjectively scored hearing performance in everyday listening situations and objectively measured speech perception abilities were examined relative to the speech coding strategies used. When subjects were grouped by strategy there were significant differences between users of older 'standard' strategies and users of the newer, currently available strategies (HiRes and HiRes 120), especially in the categories of telephone use and music perception. Significant correlations were found between certain subjective ratings and the objective speech perception data in noise. There is a good correlation between subjective and objective data. Users of more recent speech coding strategies tend to have fewer problems in difficult hearing situations.
Landwehr, Markus; Fürstenberg, Dirk; Walger, Martin; von Wedel, Hasso; Meister, Hartmut
2014-01-01
Advances in speech coding strategies and electrode array designs for cochlear implants (CIs) predominantly aim at improving speech perception. Current efforts are also directed at transmitting appropriate cues of the fundamental frequency (F0) to the auditory nerve with respect to speech quality, prosody, and music perception. The aim of this study was to examine the effects of various electrode configurations and coding strategies on speech intonation identification, speaker gender identification, and music quality rating. In six MED-EL CI users electrodes were selectively deactivated in order to simulate different insertion depths and inter-electrode distances when using the high definition continuous interleaved sampling (HDCIS) and fine structure processing (FSP) speech coding strategies. Identification of intonation and speaker gender was determined and music quality rating was assessed. For intonation identification HDCIS was robust against the different electrode configurations, whereas fine structure processing showed significantly worse results when a short electrode depth was simulated. In contrast, speaker gender recognition was not affected by electrode configuration or speech coding strategy. Music quality rating was sensitive to electrode configuration. In conclusion, the three experiments revealed different outcomes, even though they all addressed the reception of F0 cues. Rapid changes in F0, as seen with intonation, were the most sensitive to electrode configurations and coding strategies. In contrast, electrode configurations and coding strategies did not show large effects when F0 information was available over a longer time period, as seen with speaker gender. Music quality relies on additional spectral cues other than F0, and was poorest when a shallow insertion was simulated.
Vaerenberg, Bart; Péan, Vincent; Lesbros, Guillaume; De Ceulaer, Geert; Schauwers, Karen; Daemers, Kristin; Gnansia, Dan; Govaerts, Paul J
2013-06-01
To assess the auditory performance of Digisonic(®) cochlear implant users with electric stimulation (ES) and electro-acoustic stimulation (EAS) with special attention to the processing of low-frequency temporal fine structure. Six patients implanted with a Digisonic(®) SP implant and showing low-frequency residual hearing were fitted with the Zebra(®) speech processor providing both electric and acoustic stimulation. Assessment consisted of monosyllabic speech identification tests in quiet and in noise at different presentation levels, and a pitch discrimination task using harmonic and disharmonic intonating complex sounds ( Vaerenberg et al., 2011 ). These tests investigate place and time coding through pitch discrimination. All tasks were performed with ES only and with EAS. Speech results in noise showed significant improvement with EAS when compared to ES. Whereas EAS did not yield better results in the harmonic intonation test, the improvements in the disharmonic intonation test were remarkable, suggesting better coding of pitch cues requiring phase locking. These results suggest that patients with residual hearing in the low-frequency range still have good phase-locking capacities, allowing them to process fine temporal information. ES relies mainly on place coding but provides poor low-frequency temporal coding, whereas EAS also provides temporal coding in the low-frequency range. Patients with residual phase-locking capacities can make use of these cues.
Neural evidence for predictive coding in auditory cortex during speech production.
Okada, Kayoko; Matchin, William; Hickok, Gregory
2018-02-01
Recent models of speech production suggest that motor commands generate forward predictions of the auditory consequences of those commands, that these forward predications can be used to monitor and correct speech output, and that this system is hierarchically organized (Hickok, Houde, & Rong, Neuron, 69(3), 407--422, 2011; Pickering & Garrod, Behavior and Brain Sciences, 36(4), 329--347, 2013). Recent psycholinguistic research has shown that internally generated speech (i.e., imagined speech) produces different types of errors than does overt speech (Oppenheim & Dell, Cognition, 106(1), 528--537, 2008; Oppenheim & Dell, Memory & Cognition, 38(8), 1147-1160, 2010). These studies suggest that articulated speech might involve predictive coding at additional levels than imagined speech. The current fMRI experiment investigates neural evidence of predictive coding in speech production. Twenty-four participants from UC Irvine were recruited for the study. Participants were scanned while they were visually presented with a sequence of words that they reproduced in sync with a visual metronome. On each trial, they were cued to either silently articulate the sequence or to imagine the sequence without overt articulation. As expected, silent articulation and imagined speech both engaged a left hemisphere network previously implicated in speech production. A contrast of silent articulation with imagined speech revealed greater activation for articulated speech in inferior frontal cortex, premotor cortex and the insula in the left hemisphere, consistent with greater articulatory load. Although both conditions were silent, this contrast also produced significantly greater activation in auditory cortex in dorsal superior temporal gyrus in both hemispheres. We suggest that these activations reflect forward predictions arising from additional levels of the perceptual/motor hierarchy that are involved in monitoring the intended speech output.
Xiao, Bo; Imel, Zac E; Georgiou, Panayiotis G; Atkins, David C; Narayanan, Shrikanth S
2015-01-01
The technology for evaluating patient-provider interactions in psychotherapy-observational coding-has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies.
Bilingual Voicing: A Study of Code-Switching in the Reported Speech of Finnish Immigrants in Estonia
ERIC Educational Resources Information Center
Frick, Maria; Riionheimo, Helka
2013-01-01
Through a conversation analytic investigation of Finnish-Estonian bilingual (direct) reported speech (i.e., voicing) by Finns who live in Estonia, this study shows how code-switching is used as a double contextualization device. The code-switched voicings are shaped by the on-going interactional situation, serving its needs by opening up a context…
Look at the Gato! Code-Switching in Speech to Toddlers
ERIC Educational Resources Information Center
Bail, Amelie; Morini, Giovanna; Newman, Rochelle S.
2015-01-01
We examined code-switching (CS) in the speech of twenty-four bilingual caregivers when speaking with their 18- to 24-month-old children. All parents CS at least once in a short play session, and some code-switched quite often (over 1/3 of utterances). This CS included both inter-sentential and intra-sentential switches, suggesting that at least…
4800 B/S speech compression techniques for mobile satellite systems
NASA Technical Reports Server (NTRS)
Townes, S. A.; Barnwell, T. P., III; Rose, R. C.; Gersho, A.; Davidson, G.
1986-01-01
This paper will discuss three 4800 bps digital speech compression techniques currently being investigated for application in the mobile satellite service. These three techniques, vector adaptive predictive coding, vector excitation coding, and the self excited vocoder, are the most promising among a number of techniques being developed to possibly provide near-toll-quality speech compression while still keeping the bit-rate low enough for a power and bandwidth limited satellite service.
Abrams, Daniel A; Nicol, Trent; White-Schwoch, Travis; Zecker, Steven; Kraus, Nina
2017-05-01
Speech perception relies on a listener's ability to simultaneously resolve multiple temporal features in the speech signal. Little is known regarding neural mechanisms that enable the simultaneous coding of concurrent temporal features in speech. Here we show that two categories of temporal features in speech, the low-frequency speech envelope and periodicity cues, are processed by distinct neural mechanisms within the same population of cortical neurons. We measured population activity in primary auditory cortex of anesthetized guinea pig in response to three variants of a naturally produced sentence. Results show that the envelope of population responses closely tracks the speech envelope, and this cortical activity more closely reflects wider bandwidths of the speech envelope compared to narrow bands. Additionally, neuronal populations represent the fundamental frequency of speech robustly with phase-locked responses. Importantly, these two temporal features of speech are simultaneously observed within neuronal ensembles in auditory cortex in response to clear, conversation, and compressed speech exemplars. Results show that auditory cortical neurons are adept at simultaneously resolving multiple temporal features in extended speech sentences using discrete coding mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.
Neben, Nicole; Lenarz, Thomas; Schuessler, Mark; Harpel, Theo; Buechner, Andreas
2013-05-01
Results for speech recognition in noise tests when using a new research coding strategy designed to introduce the virtual channel effect provided no advantage over MP3(000™). Although statistically significant smaller just noticeable differences (JNDs) were obtained, the findings for pitch ranking proved to have little clinical impact. The aim of this study was to explore whether modifications to MP3000 by including sequential virtual channel stimulation would lead to further improvements in hearing, particularly for speech recognition in background noise and in competing-talker conditions, and to compare results for pitch perception and melody recognition, as well as informally collect subjective impressions on strategy preference. Nine experienced cochlear implant subjects were recruited for the prospective study. Two variants of the experimental strategy were compared to MP3000. The study design was a single-blinded ABCCBA cross-over trial paradigm with 3 weeks of take-home experience for each user condition. Comparing results of pitch-ranking, a significantly reduced JND was identified. No significant effect of coding strategy on speech understanding in noise or competing-talker materials was found. Melody recognition skills were the same under all user conditions.
Harris, Margaret; Moreno, Constanza
2006-01-01
Nine children with severe-profound prelingual hearing loss and single-word reading scores not more than 10 months behind chronological age (Good Readers) were matched with 9 children whose reading lag was at least 15 months (Poor Readers). Good Readers had significantly higher spelling and reading comprehension scores. They produced significantly more phonetic errors (indicating the use of phonological coding) and more often correctly represented the number of syllables in spelling than Poor Readers. They also scored more highly on orthographic awareness and were better at speech reading. Speech intelligibility was the same in the two groups. Cluster analysis revealed that only three Good Readers showed strong evidence of phonetic coding in spelling although seven had good representation of syllables; only four had high orthographic awareness scores. However, all 9 children were good speech readers, suggesting that a phonological code derived through speech reading may underpin reading success for deaf children.
NASA Astrophysics Data System (ADS)
Riera-Palou, Felip; den Brinker, Albertus C.
2007-12-01
This paper introduces a new audio and speech broadband coding technique based on the combination of a pulse excitation coder and a standardized parametric coder, namely, MPEG-4 high-quality parametric coder. After presenting a series of enhancements to regular pulse excitation (RPE) to make it suitable for the modeling of broadband signals, it is shown how pulse and parametric codings complement each other and how they can be merged to yield a layered bit stream scalable coder able to operate at different points in the quality bit rate plane. The performance of the proposed coder is evaluated in a listening test. The major result is that the extra functionality of the bit stream scalability does not come at the price of a reduced performance since the coder is competitive with standardized coders (MP3, AAC, SSC).
More About Vector Adaptive/Predictive Coding Of Speech
NASA Technical Reports Server (NTRS)
Jedrey, Thomas C.; Gersho, Allen
1992-01-01
Report presents additional information about digital speech-encoding and -decoding system described in "Vector Adaptive/Predictive Encoding of Speech" (NPO-17230). Summarizes development of vector adaptive/predictive coding (VAPC) system and describes basic functions of algorithm. Describes refinements introduced enabling receiver to cope with errors. VAPC algorithm implemented in integrated-circuit coding/decoding processors (codecs). VAPC and other codecs tested under variety of operating conditions. Tests designed to reveal effects of various background quiet and noisy environments and of poor telephone equipment. VAPC found competitive with and, in some respects, superior to other 4.8-kb/s codecs and other codecs of similar complexity.
Speaking of Race, Speaking of Sex: Hate Speech, Civil Rights, and Civil Liberties.
ERIC Educational Resources Information Center
Gates, Henry Louis, Jr.; And Others
The essays of this collection explore the restriction of speech and the hate speech codes that attempt to restrict bigoted or offensive speech and punish those who engage in it. These essays generally argue that speech restrictions are dangerous and counterproductive, but they acknowledge that it is very difficult to distinguish between…
ERIC Educational Resources Information Center
Raine, Adrian; And Others
1991-01-01
Children with speech disorders had lower short-term memory capacity and smaller word length effect than control children. Children with speech disorders also had reduced speech-motor activity during rehearsal. Results suggest that speech rate may be a causal determinant of verbal short-term memory capacity. (BC)
Speech processing using maximum likelihood continuity mapping
Hogden, John E.
2000-01-01
Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Speech processing using maximum likelihood continuity mapping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.E.
Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Lee, Yune-Sang; Turkeltaub, Peter; Granger, Richard; Raizada, Rajeev D S
2012-03-14
Although much effort has been directed toward understanding the neural basis of speech processing, the neural processes involved in the categorical perception of speech have been relatively less studied, and many questions remain open. In this functional magnetic resonance imaging (fMRI) study, we probed the cortical regions mediating categorical speech perception using an advanced brain-mapping technique, whole-brain multivariate pattern-based analysis (MVPA). Normal healthy human subjects (native English speakers) were scanned while they listened to 10 consonant-vowel syllables along the /ba/-/da/ continuum. Outside of the scanner, individuals' own category boundaries were measured to divide the fMRI data into /ba/ and /da/ conditions per subject. The whole-brain MVPA revealed that Broca's area and the left pre-supplementary motor area evoked distinct neural activity patterns between the two perceptual categories (/ba/ vs /da/). Broca's area was also found when the same analysis was applied to another dataset (Raizada and Poldrack, 2007), which previously yielded the supramarginal gyrus using a univariate adaptation-fMRI paradigm. The consistent MVPA findings from two independent datasets strongly indicate that Broca's area participates in categorical speech perception, with a possible role of translating speech signals into articulatory codes. The difference in results between univariate and multivariate pattern-based analyses of the same data suggest that processes in different cortical areas along the dorsal speech perception stream are distributed on different spatial scales.
Altieri, Nicholas; Pisoni, David B.; Townsend, James T.
2012-01-01
Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield’s feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration. PMID:21968081
Altieri, Nicholas; Pisoni, David B; Townsend, James T
2011-01-01
Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield's feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration.
Paper-Based Textbooks with Audio Support for Print-Disabled Students.
Fujiyoshi, Akio; Ohsawa, Akiko; Takaira, Takuya; Tani, Yoshiaki; Fujiyoshi, Mamoru; Ota, Yuko
2015-01-01
Utilizing invisible 2-dimensional codes and digital audio players with a 2-dimensional code scanner, we developed paper-based textbooks with audio support for students with print disabilities, called "multimodal textbooks." Multimodal textbooks can be read with the combination of the two modes: "reading printed text" and "listening to the speech of the text from a digital audio player with a 2-dimensional code scanner." Since multimodal textbooks look the same as regular textbooks and the price of a digital audio player is reasonable (about 30 euro), we think multimodal textbooks are suitable for students with print disabilities in ordinary classrooms.
NASA Technical Reports Server (NTRS)
Birch, J. N.; Getzin, N.
1971-01-01
Analog and digital voice coding techniques for application to an L-band satellite-basedair traffic control (ATC) system for over ocean deployment are examined. In addition to performance, the techniques are compared on the basis of cost, size, weight, power consumption, availability, reliability, and multiplexing features. Candidate systems are chosen on the bases of minimum required RF bandwidth and received carrier-to-noise density ratios. A detailed survey of automated and nonautomated intelligibility testing methods and devices is presented and comparisons given. Subjective evaluation of speech system by preference tests is considered. Conclusion and recommendations are developed regarding the selection of the voice system. Likewise, conclusions and recommendations are developed for the appropriate use of intelligibility tests, speech quality measurements, and preference tests with the framework of the proposed ATC system.
Hate Speech and the First Amendment.
ERIC Educational Resources Information Center
Rainey, Susan J.; Kinsler, Waren S.; Kannarr, Tina L.; Reaves, Asa E.
This document is comprised of California state statutes, federal legislation, and court litigation pertaining to hate speech and the First Amendment. The document provides an overview of California education code sections relating to the regulation of speech; basic principles of the First Amendment; government efforts to regulate hate speech,…
The Cheerleaders' Mock Execution
ERIC Educational Resources Information Center
Trujillo-Jenks, Laura
2011-01-01
The fervor of student speech is demonstrated through different mediums and venues in public schools. In this case, a new principal encounters the mores of a community that believes in free speech, specifically student free speech. When a pep rally becomes a venue for hate speech, terroristic threats, and profanity, the student code of conduct…
Noise-robust speech recognition through auditory feature detection and spike sequence decoding.
Schafer, Phillip B; Jin, Dezhe Z
2014-03-01
Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.
Automating annotation of information-giving for analysis of clinical conversation.
Mayfield, Elijah; Laws, M Barton; Wilson, Ira B; Penstein Rosé, Carolyn
2014-02-01
Coding of clinical communication for fine-grained features such as speech acts has produced a substantial literature. However, annotation by humans is laborious and expensive, limiting application of these methods. We aimed to show that through machine learning, computers could code certain categories of speech acts with sufficient reliability to make useful distinctions among clinical encounters. The data were transcripts of 415 routine outpatient visits of HIV patients which had previously been coded for speech acts using the Generalized Medical Interaction Analysis System (GMIAS); 50 had also been coded for larger scale features using the Comprehensive Analysis of the Structure of Encounters System (CASES). We aggregated selected speech acts into information-giving and requesting, then trained the machine to automatically annotate using logistic regression classification. We evaluated reliability by per-speech act accuracy. We used multiple regression to predict patient reports of communication quality from post-visit surveys using the patient and provider information-giving to information-requesting ratio (briefly, information-giving ratio) and patient gender. Automated coding produces moderate reliability with human coding (accuracy 71.2%, κ=0.57), with high correlation between machine and human prediction of the information-giving ratio (r=0.96). The regression significantly predicted four of five patient-reported measures of communication quality (r=0.263-0.344). The information-giving ratio is a useful and intuitive measure for predicting patient perception of provider-patient communication quality. These predictions can be made with automated annotation, which is a practical option for studying large collections of clinical encounters with objectivity, consistency, and low cost, providing greater opportunity for training and reflection for care providers.
Spectral analysis method and sample generation for real time visualization of speech
NASA Astrophysics Data System (ADS)
Hobohm, Klaus
A method for translating speech signals into optical models, characterized by high sound discrimination and learnability and designed to provide to deaf persons a feedback towards control of their way of speaking, is presented. Important properties of speech production and perception processes and organs involved in these mechanisms are recalled in order to define requirements for speech visualization. It is established that the spectral representation of time, frequency and amplitude resolution of hearing must be fair and continuous variations of acoustic parameters of speech signal must be depicted by a continuous variation of images. A color table was developed for dynamic illustration and sonograms were generated with five spectral analysis methods such as Fourier transformations and linear prediction coding. For evaluating sonogram quality, test persons had to recognize consonant/vocal/consonant words and an optimized analysis method was achieved with a fast Fourier transformation and a postprocessor. A hardware concept of a real time speech visualization system, based on multiprocessor technology in a personal computer, is presented.
Space station interior noise analysis program
NASA Technical Reports Server (NTRS)
Stusnick, E.; Burn, M.
1987-01-01
Documentation is provided for a microcomputer program which was developed to evaluate the effect of the vibroacoustic environment on speech communication inside a space station. The program, entitled Space Station Interior Noise Analysis Program (SSINAP), combines a Statistical Energy Analysis (SEA) prediction of sound and vibration levels within the space station with a speech intelligibility model based on the Modulation Transfer Function and the Speech Transmission Index (MTF/STI). The SEA model provides an effective analysis tool for predicting the acoustic environment based on proposed space station design. The MTF/STI model provides a method for evaluating speech communication in the relatively reverberant and potentially noisy environments that are likely to occur in space stations. The combinations of these two models provides a powerful analysis tool for optimizing the acoustic design of space stations from the point of view of speech communications. The mathematical algorithms used in SSINAP are presented to implement the SEA and MTF/STI models. An appendix provides an explanation of the operation of the program along with details of the program structure and code.
Noise suppression methods for robust speech processing
NASA Astrophysics Data System (ADS)
Boll, S. F.; Ravindra, H.; Randall, G.; Armantrout, R.; Power, R.
1980-05-01
Robust speech processing in practical operating environments requires effective environmental and processor noise suppression. This report describes the technical findings and accomplishments during this reporting period for the research program funded to develop real time, compressed speech analysis synthesis algorithms whose performance in invariant under signal contamination. Fulfillment of this requirement is necessary to insure reliable secure compressed speech transmission within realistic military command and control environments. Overall contributions resulting from this research program include the understanding of how environmental noise degrades narrow band, coded speech, development of appropriate real time noise suppression algorithms, and development of speech parameter identification methods that consider signal contamination as a fundamental element in the estimation process. This report describes the current research and results in the areas of noise suppression using the dual input adaptive noise cancellation using the short time Fourier transform algorithms, articulation rate change techniques, and a description of an experiment which demonstrated that the spectral subtraction noise suppression algorithm can improve the intelligibility of 2400 bps, LPC 10 coded, helicopter speech by 10.6 point.
Effects of synthetic speech output in the learning of graphic symbols of varied iconicity.
Koul, Rajinder; Schlosser, Ralf
To examine the effects of additional auditory feedback from synthetic speech on the learning of high translucent symbols versus low translucent symbols. Two adults with little or no functional speech and severe intellectual disabilities served as participants. A single-subject ABACA/ACABA design was used to study the relative effects of two treatments: symbol training in the presence and absence of synthetic speech output. The results clearly indicated that the two treatments, rather than extraneous variables were responsible for gains in the symbol learning. Both participants learned either more low translucent symbols or reached their maximum learning of low translucent symbols in the speech output condition. The results of this preliminary study replicate and extend the iconicity hypothesis to a new set of learning conditions involving speech output, and suggest that feedback from speech output may assist adults with profound intellectual disabilities in coding particularly those symbols whose association with their referent cannot be coded via their visual resemblance with the referent.
School Dress Codes v. The First Amendment: Ganging up on Student Attire.
ERIC Educational Resources Information Center
Jahn, Karon L.
Do school dress codes written with the specific purpose of limiting individual dress preferences, including dress associated with gangs, infringe on speech freedoms granted by the First Amendment of the U.S. Constitution? Although the Supreme Court has extended its protection of political speech to nonverbal acts of communication, it has…
Lorens, Artur; Zgoda, Małgorzata; Obrycka, Anita; Skarżynski, Henryk
2010-12-01
Presently, there are only few studies examining the benefits of fine structure information in coding strategies. Against this background, this study aims to assess the objective and subjective performance of children experienced with the C40+ cochlear implant using the CIS+ coding strategy who were upgraded to the OPUS 2 processor using FSP and HDCIS. In this prospective study, 60 children with more than 3.5 years of experience with the C40+ cochlear implant were upgraded to the OPUS 2 processor and fit and tested with HDCIS (Interval I). After 3 months of experience with HDCIS, they were fit with the FSP coding strategy (Interval II) and tested with all strategies (FSP, HDCIS, CIS+). After an additional 3-4 months, they were assessed on all three strategies and asked to choose their take-home strategy (Interval III). The children were tested using the Adaptive Auditory Speech Test which measures speech reception threshold (SRT) in quiet and noise at each test interval. The children were also asked to rate on a Visual Analogue Scale their satisfaction and coding strategy preference when listening to speech and a pop song. However, since not all tests could be performed at one single visit, some children were not able complete all tests at all intervals. At the study endpoint, speech in quiet showed a significant difference in SRT of 1.0 dB between FSP and HDCIS, with FSP performing better. FSP proved a better strategy compared with CIS+, showing lower SRT results of 5.2 dB. Speech in noise tests showed FSP to be significantly better than CIS+ by 0.7 dB, and HDCIS to be significantly better than CIS+ by 0.8 dB. Both satisfaction and coding strategy preference ratings also revealed that FSP and HDCIS strategies were better than CIS+ strategy when listening to speech and music. FSP was better than HDCIS when listening to speech. This study demonstrates that long-term pediatric users of the COMBI 40+ are able to upgrade to a newer processor and coding strategy without compromising their listening performance and even improving their performance with FSP after a short time of experience. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
ERIC Educational Resources Information Center
Podgor, Ellen S.
1976-01-01
The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)
Matsushima, J; Kumagai, M; Harada, C; Takahashi, K; Inuyama, Y; Ifukube, T
1992-09-01
Our previous reports showed that second formant information, using a speech coding method, could be transmitted through an electrode on the promontory. However, second formant information can also be transmitted by tactile stimulation. Therefore, to find out whether electrical stimulation of the auditory nerve would be superior to tactile stimulation for our speech coding method, the time resolutions of the two modes of stimulation were compared. The results showed that the time resolution of electrical promontory stimulation was three times better than the time resolution of tactile stimulation of the finger. This indicates that electrical stimulation of the auditory nerve is much better for our speech coding method than tactile stimulation of the finger.
Kong, Anthony Pak-Hin; Law, Sam-Po; Kwan, Connie Ching-Yin; Lai, Christy; Lam, Vivian
2014-01-01
Gestures are commonly used together with spoken language in human communication. One major limitation of gesture investigations in the existing literature lies in the fact that the coding of forms and functions of gestures has not been clearly differentiated. This paper first described a recently developed Database of Speech and GEsture (DoSaGE) based on independent annotation of gesture forms and functions among 119 neurologically unimpaired right-handed native speakers of Cantonese (divided into three age and two education levels), and presented findings of an investigation examining how gesture use was related to age and linguistic performance. Consideration of these two factors, for which normative data are currently very limited or lacking in the literature, is relevant and necessary when one evaluates gesture employment among individuals with and without language impairment. Three speech tasks, including monologue of a personally important event, sequential description, and story-telling, were used for elicitation. The EUDICO Linguistic ANnotator (ELAN) software was used to independently annotate each participant’s linguistic information of the transcript, forms of gestures used, and the function for each gesture. About one-third of the subjects did not use any co-verbal gestures. While the majority of gestures were non-content-carrying, which functioned mainly for reinforcing speech intonation or controlling speech flow, the content-carrying ones were used to enhance speech content. Furthermore, individuals who are younger or linguistically more proficient tended to use fewer gestures, suggesting that normal speakers gesture differently as a function of age and linguistic performance. PMID:25667563
Spotlight on Speech Codes 2011: The State of Free Speech on Our Nation's Campuses
ERIC Educational Resources Information Center
Foundation for Individual Rights in Education (NJ1), 2011
2011-01-01
Each year, the Foundation for Individual Rights in Education (FIRE) conducts a rigorous survey of restrictions on speech at America's colleges and universities. The survey and accompanying report explore the extent to which schools are meeting their legal and moral obligations to uphold students' and faculty members' rights to freedom of speech,…
Spotlight on Speech Codes 2009: The State of Free Speech on Our Nation's Campuses
ERIC Educational Resources Information Center
Foundation for Individual Rights in Education (NJ1), 2009
2009-01-01
Each year, the Foundation for Individual Rights in Education (FIRE) conducts a wide, detailed survey of restrictions on speech at America's colleges and universities. The survey and resulting report explore the extent to which schools are meeting their obligations to uphold students' and faculty members' rights to freedom of speech, freedom of…
Spotlight on Speech Codes 2010: The State of Free Speech on Our Nation's Campuses
ERIC Educational Resources Information Center
Foundation for Individual Rights in Education (NJ1), 2010
2010-01-01
Each year, the Foundation for Individual Rights in Education (FIRE) conducts a rigorous survey of restrictions on speech at America's colleges and universities. The survey and resulting report explore the extent to which schools are meeting their legal and moral obligations to uphold students' and faculty members' rights to freedom of speech,…
Design of a robust baseband LPC coder for speech transmission over 9.6 kbit/s noisy channels
NASA Astrophysics Data System (ADS)
Viswanathan, V. R.; Russell, W. H.; Higgins, A. L.
1982-04-01
This paper describes the design of a baseband Linear Predictive Coder (LPC) which transmits speech over 9.6 kbit/sec synchronous channels with random bit errors of up to 1%. Presented are the results of our investigation of a number of aspects of the baseband LPC coder with the goal of maximizing the quality of the transmitted speech. Important among these aspects are: bandwidth of the baseband, coding of the baseband residual, high-frequency regeneration, and error protection of important transmission parameters. The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder. This optimized speech coding algorithm has been implemented as a real-time full-duplex system on an array processor. Informal listening tests of the real-time coder have shown that the coder produces good speech quality in the absence of channel bit errors and introduces only a slight degradation in quality for channel bit error rates of up to 1%.
ERIC Educational Resources Information Center
Riley, Gresham
1993-01-01
It is argued that the arguments currently advanced for limiting speech on college campuses are also arguments that will compromise academic freedom and that a distinction needs to be made between the right of free speech and the wisdom of exercising the right on any given occasion. (MSE)
[Prosody, speech input and language acquisition].
Jungheim, M; Miller, S; Kühn, D; Ptok, M
2014-04-01
In order to acquire language, children require speech input. The prosody of the speech input plays an important role. In most cultures adults modify their code when communicating with children. Compared to normal speech this code differs especially with regard to prosody. For this review a selective literature search in PubMed and Scopus was performed. Prosodic characteristics are a key feature of spoken language. By analysing prosodic features, children gain knowledge about underlying grammatical structures. Child-directed speech (CDS) is modified in a way that meaningful sequences are highlighted acoustically so that important information can be extracted from the continuous speech flow more easily. CDS is said to enhance the representation of linguistic signs. Taking into consideration what has previously been described in the literature regarding the perception of suprasegmentals, CDS seems to be able to support language acquisition due to the correspondence of prosodic and syntactic units. However, no findings have been reported, stating that the linguistically reduced CDS could hinder first language acquisition.
Ruble, Lisa; Birdwhistell, Jessie; Toland, Michael D; McGrew, John H
2011-01-01
The significant increase in the numbers of students with autism combined with the need for better trained teachers (National Research Council, 2001) call for research on the effectiveness of alternative methods, such as consultation, that have the potential to improve service delivery. Data from 2 randomized controlled single-blind trials indicate that an autism-specific consultation planning framework known as the collaborative model for promoting competence and success (COMPASS) is effective in increasing child Individual Education Programs (IEP) outcomes (Ruble, Dal-rymple, & McGrew, 2010; Ruble, McGrew, & Toland, 2011). In this study, we describe the verbal interactions, defined as speech acts and speech act exchanges that take place during COMPASS consultation, and examine the associations between speech exchanges and child outcomes. We applied the Psychosocial Processes Coding Scheme (Leaper, 1991) to code speech acts. Speech act exchanges were overwhelmingly affiliative, failed to show statistically significant relationships with child IEP outcomes and teacher adherence, but did correlate positively with IEP quality.
RUBLE, LISA; BIRDWHISTELL, JESSIE; TOLAND, MICHAEL D.; MCGREW, JOHN H.
2011-01-01
The significant increase in the numbers of students with autism combined with the need for better trained teachers (National Research Council, 2001) call for research on the effectiveness of alternative methods, such as consultation, that have the potential to improve service delivery. Data from 2 randomized controlled single-blind trials indicate that an autism-specific consultation planning framework known as the collaborative model for promoting competence and success (COMPASS) is effective in increasing child Individual Education Programs (IEP) outcomes (Ruble, Dal-rymple, & McGrew, 2010; Ruble, McGrew, & Toland, 2011). In this study, we describe the verbal interactions, defined as speech acts and speech act exchanges that take place during COMPASS consultation, and examine the associations between speech exchanges and child outcomes. We applied the Psychosocial Processes Coding Scheme (Leaper, 1991) to code speech acts. Speech act exchanges were overwhelmingly affiliative, failed to show statistically significant relationships with child IEP outcomes and teacher adherence, but did correlate positively with IEP quality. PMID:22639523
Self-Organization: Complex Dynamical Systems in the Evolution of Speech
NASA Astrophysics Data System (ADS)
Oudeyer, Pierre-Yves
Human vocalization systems are characterized by complex structural properties. They are combinatorial, based on the systematic reuse of phonemes, and the set of repertoires in human languages is characterized by both strong statistical regularities—universals—and a great diversity. Besides, they are conventional codes culturally shared in each community of speakers. What are the origins of the forms of speech? What are the mechanisms that permitted their evolution in the course of phylogenesis and cultural evolution? How can a shared speech code be formed in a community of individuals? This chapter focuses on the way the concept of self-organization, and its interaction with natural selection, can throw light on these three questions. In particular, a computational model is presented which shows that a basic neural equipment for adaptive holistic vocal imitation, coupling directly motor and perceptual representations in the brain, can generate spontaneously shared combinatorial systems of vocalizations in a society of babbling individuals. Furthermore, we show how morphological and physiological innate constraints can interact with these self-organized mechanisms to account for both the formation of statistical regularities and diversity in vocalization systems.
Speech coding at 4800 bps for mobile satellite communications
NASA Technical Reports Server (NTRS)
Gersho, Allen; Chan, Wai-Yip; Davidson, Grant; Chen, Juin-Hwey; Yong, Mei
1988-01-01
A speech compression project has recently been completed to develop a speech coding algorithm suitable for operation in a mobile satellite environment aimed at providing telephone quality natural speech at 4.8 kbps. The work has resulted in two alternative techniques which achieve reasonably good communications quality at 4.8 kbps while tolerating vehicle noise and rather severe channel impairments. The algorithms are embodied in a compact self-contained prototype consisting of two AT and T 32-bit floating-point DSP32 digital signal processors (DSP). A Motorola 68HC11 microcomputer chip serves as the board controller and interface handler. On a wirewrapped card, the prototype's circuit footprint amounts to only 200 sq cm, and consumes about 9 watts of power.
The analysis of verbal interaction sequences in dyadic clinical communication: a review of methods.
Connor, Martin; Fletcher, Ian; Salmon, Peter
2009-05-01
To identify methods available for sequential analysis of dyadic verbal clinical communication and to review their methodological and conceptual differences. Critical review, based on literature describing sequential analyses of clinical and other relevant social interaction. Dominant approaches are based on analysis of communication according to its precise position in the series of utterances that constitute event-coded dialogue. For practical reasons, methods focus on very short-term processes, typically the influence of one party's speech on what the other says next. Studies of longer-term influences are rare. Some analyses have statistical limitations, particularly in disregarding heterogeneity between consultations, patients or practitioners. Additional techniques, including ones that can use information about timing and duration of speech from interval-coding are becoming available. There is a danger that constraints of commonly used methods shape research questions and divert researchers from potentially important communication processes including ones that operate over a longer-term than one or two speech turns. Given that no one method can model the complexity of clinical communication, multiple methods, both quantitative and qualitative, are necessary. Broadening the range of methods will allow the current emphasis on exploratory studies to be balanced by tests of hypotheses about clinically important communication processes.
1986-03-01
attributed to insufficient power in the experimental design: Two of the studies that failed to find evidence of sign-based coding when printed words...perception of [p]; so may a lesser amount of silence, insufficient to cue a [p] percept in itself, followed bytransitions characteristic of [p] release...posterior pharyngeal wall has become visible through the nasal passage; the Velotrace is inserted using a procedure similar to that used for nasal
Different Timescales for the Neural Coding of Consonant and Vowel Sounds
Perez, Claudia A.; Engineer, Crystal T.; Jakkamsetti, Vikram; Carraway, Ryan S.; Perry, Matthew S.
2013-01-01
Psychophysical, clinical, and imaging evidence suggests that consonant and vowel sounds have distinct neural representations. This study tests the hypothesis that consonant and vowel sounds are represented on different timescales within the same population of neurons by comparing behavioral discrimination with neural discrimination based on activity recorded in rat inferior colliculus and primary auditory cortex. Performance on 9 vowel discrimination tasks was highly correlated with neural discrimination based on spike count and was not correlated when spike timing was preserved. In contrast, performance on 11 consonant discrimination tasks was highly correlated with neural discrimination when spike timing was preserved and not when spike timing was eliminated. These results suggest that in the early stages of auditory processing, spike count encodes vowel sounds and spike timing encodes consonant sounds. These distinct coding strategies likely contribute to the robust nature of speech sound representations and may help explain some aspects of developmental and acquired speech processing disorders. PMID:22426334
Equality marker in the language of bali
NASA Astrophysics Data System (ADS)
Wajdi, Majid; Subiyanto, Paulus
2018-01-01
The language of Bali could be grouped into one of the most elaborate languages of the world since the existence of its speech levels, low and high speech levels, as the language of Java has. Low and high speech levels of the language of Bali are language codes that could be used to show and express social relationship between or among its speakers. This paper focuses on describing, analyzing, and interpreting the use of the low code of the language of Bali in daily communication in the speech community of Pegayaman, Bali. Observational and documentation methods were applied to provide the data for the research. Recoding and field note techniques were executed to provide the data. Recorded in spoken language and the study of novel of Balinese were transcribed into written form to ease the process of analysis. Symmetric use of low code expresses social equality between or among the participants involves in the communication. It also implies social intimacy between or among the speakers of the language of Bali. Regular and patterned use of the low code of the language of Bali is not merely communication strategy, but it is a kind of communication agreement or communication contract between the participants. By using low code during their social and communication activities, the participants shared and express their social equality and intimacy between or among the participants involve in social and communication activities.
Goehring, Tobias; Bolner, Federico; Monaghan, Jessica J M; van Dijk, Bas; Zarowski, Andrzej; Bleeck, Stefan
2017-02-01
Speech understanding in noisy environments is still one of the major challenges for cochlear implant (CI) users in everyday life. We evaluated a speech enhancement algorithm based on neural networks (NNSE) for improving speech intelligibility in noise for CI users. The algorithm decomposes the noisy speech signal into time-frequency units, extracts a set of auditory-inspired features and feeds them to the neural network to produce an estimation of which frequency channels contain more perceptually important information (higher signal-to-noise ratio, SNR). This estimate is used to attenuate noise-dominated and retain speech-dominated CI channels for electrical stimulation, as in traditional n-of-m CI coding strategies. The proposed algorithm was evaluated by measuring the speech-in-noise performance of 14 CI users using three types of background noise. Two NNSE algorithms were compared: a speaker-dependent algorithm, that was trained on the target speaker used for testing, and a speaker-independent algorithm, that was trained on different speakers. Significant improvements in the intelligibility of speech in stationary and fluctuating noises were found relative to the unprocessed condition for the speaker-dependent algorithm in all noise types and for the speaker-independent algorithm in 2 out of 3 noise types. The NNSE algorithms used noise-specific neural networks that generalized to novel segments of the same noise type and worked over a range of SNRs. The proposed algorithm has the potential to improve the intelligibility of speech in noise for CI users while meeting the requirements of low computational complexity and processing delay for application in CI devices. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Davis, Matthew H.
2016-01-01
Successful perception depends on combining sensory input with prior knowledge. However, the underlying mechanism by which these two sources of information are combined is unknown. In speech perception, as in other domains, two functionally distinct coding schemes have been proposed for how expectations influence representation of sensory evidence. Traditional models suggest that expected features of the speech input are enhanced or sharpened via interactive activation (Sharpened Signals). Conversely, Predictive Coding suggests that expected features are suppressed so that unexpected features of the speech input (Prediction Errors) are processed further. The present work is aimed at distinguishing between these two accounts of how prior knowledge influences speech perception. By combining behavioural, univariate, and multivariate fMRI measures of how sensory detail and prior expectations influence speech perception with computational modelling, we provide evidence in favour of Prediction Error computations. Increased sensory detail and informative expectations have additive behavioural and univariate neural effects because they both improve the accuracy of word report and reduce the BOLD signal in lateral temporal lobe regions. However, sensory detail and informative expectations have interacting effects on speech representations shown by multivariate fMRI in the posterior superior temporal sulcus. When prior knowledge was absent, increased sensory detail enhanced the amount of speech information measured in superior temporal multivoxel patterns, but with informative expectations, increased sensory detail reduced the amount of measured information. Computational simulations of Sharpened Signals and Prediction Errors during speech perception could both explain these behavioural and univariate fMRI observations. However, the multivariate fMRI observations were uniquely simulated by a Prediction Error and not a Sharpened Signal model. The interaction between prior expectation and sensory detail provides evidence for a Predictive Coding account of speech perception. Our work establishes methods that can be used to distinguish representations of Prediction Error and Sharpened Signals in other perceptual domains. PMID:27846209
Status Report on Speech Research, July 1994-December 1995.
ERIC Educational Resources Information Center
Fowler, Carol A., Ed.
This publication (one of a series) contains 19 articles which report the status and progress of studies on the nature of speech, instruments for its investigation, and practical applications. Articles are: "Speech Perception Deficits in Poor Readers: Auditory Processing or Phonological Coding?" (Maria Mody and others); "Auditory…
ERIC Educational Resources Information Center
Pratt, Michael W.; And Others
1992-01-01
Investigated relations between certain family context variables and the conversational behavior of 36 parents who were playing with their 3 year olds. Transcripts were coded for types of conversational functions and structure of parent speech. Marital satisfaction was associated with aspects of parent speech. (LB)
Wirtzfeld, Michael R; Ibrahim, Rasha A; Bruce, Ian C
2017-10-01
Perceptual studies of speech intelligibility have shown that slow variations of acoustic envelope (ENV) in a small set of frequency bands provides adequate information for good perceptual performance in quiet, whereas acoustic temporal fine-structure (TFS) cues play a supporting role in background noise. However, the implications for neural coding are prone to misinterpretation because the mean-rate neural representation can contain recovered ENV cues from cochlear filtering of TFS. We investigated ENV recovery and spike-time TFS coding using objective measures of simulated mean-rate and spike-timing neural representations of chimaeric speech, in which either the ENV or the TFS is replaced by another signal. We (a) evaluated the levels of mean-rate and spike-timing neural information for two categories of chimaeric speech, one retaining ENV cues and the other TFS; (b) examined the level of recovered ENV from cochlear filtering of TFS speech; (c) examined and quantified the contribution to recovered ENV from spike-timing cues using a lateral inhibition network (LIN); and (d) constructed linear regression models with objective measures of mean-rate and spike-timing neural cues and subjective phoneme perception scores from normal-hearing listeners. The mean-rate neural cues from the original ENV and recovered ENV partially accounted for perceptual score variability, with additional variability explained by the recovered ENV from the LIN-processed TFS speech. The best model predictions of chimaeric speech intelligibility were found when both the mean-rate and spike-timing neural cues were included, providing further evidence that spike-time coding of TFS cues is important for intelligibility when the speech envelope is degraded.
Johari, Karim; Behroozmand, Roozbeh
2017-05-01
The predictive coding model suggests that neural processing of sensory information is facilitated for temporally-predictable stimuli. This study investigated how temporal processing of visually-presented sensory cues modulates movement reaction time and neural activities in speech and hand motor systems. Event-related potentials (ERPs) were recorded in 13 subjects while they were visually-cued to prepare to produce a steady vocalization of a vowel sound or press a button in a randomized order, and to initiate the cued movement following the onset of a go signal on the screen. Experiment was conducted in two counterbalanced blocks in which the time interval between visual cue and go signal was temporally-predictable (fixed delay at 1000 ms) or unpredictable (variable between 1000 and 2000 ms). Results of the behavioral response analysis indicated that movement reaction time was significantly decreased for temporally-predictable stimuli in both speech and hand modalities. We identified premotor ERP activities with a left-lateralized parietal distribution for hand and a frontocentral distribution for speech that were significantly suppressed in response to temporally-predictable compared with unpredictable stimuli. The premotor ERPs were elicited approximately -100 ms before movement and were significantly correlated with speech and hand motor reaction times only in response to temporally-predictable stimuli. These findings suggest that the motor system establishes a predictive code to facilitate movement in response to temporally-predictable sensory stimuli. Our data suggest that the premotor ERP activities are robust neurophysiological biomarkers of such predictive coding mechanisms. These findings provide novel insights into the temporal processing mechanisms of speech and hand motor systems.
Speech input system for meat inspection and pathological coding used thereby
NASA Astrophysics Data System (ADS)
Abe, Shozo
Meat inspection is one of exclusive and important jobs of veterinarians though it is not well known in general. As the inspection should be conducted skillfully during a series of continuous operations in a slaughter house, development of automatic inspecting systems has been required for a long time. We employed a hand-free speech input system to record the inspecting data because inspecters have to use their both hands to treat the internals of catles and check their health conditions by necked eyes. The data collected by the inspectors are transfered to a speech recognizer and then stored as controlable data of each catle inspected. Control of terms such as pathological conditions to be input and their coding are also important in this speech input system and practical examples are shown.
Dilley, Laura C; Wieland, Elizabeth A; Gamache, Jessica L; McAuley, J Devin; Redford, Melissa A
2013-02-01
As children mature, changes in voice spectral characteristics co-vary with changes in speech, language, and behavior. In this study, spectral characteristics were manipulated to alter the perceived ages of talkers' voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Speech was modified by lowering formants and fundamental frequency, for 5-year-old children's utterances, or raising them, for adult caregivers' utterances. Next, participants differing in awareness of the manipulation (Experiment 1A) or amount of speech-language training (Experiment 1B) made judgments of prosodic, segmental, and talker attributes. Experiment 2 investigated the effects of spectral modification on intelligibility. Finally, in Experiment 3, trained analysts used formal prosody coding to assess prosodic characteristics of spectrally modified and unmodified speech. Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work.
NASA Technical Reports Server (NTRS)
Kondoz, A. M.; Evans, B. G.
1993-01-01
In the last decade, low bit rate speech coding research has received much attention resulting in newly developed, good quality, speech coders operating at as low as 4.8 Kb/s. Although speech quality at around 8 Kb/s is acceptable for a wide variety of applications, at 4.8 Kb/s more improvements in quality are necessary to make it acceptable to the majority of applications and users. In addition to the required low bit rate with acceptable speech quality, other facilities such as integrated digital echo cancellation and voice activity detection are now becoming necessary to provide a cost effective and compact solution. In this paper we describe a CELP speech coder with integrated echo canceller and a voice activity detector all of which have been implemented on a single DSP32C with 32 KBytes of SRAM. The quality of CELP coded speech has been improved significantly by a new codebook implementation which also simplifies the encoder/decoder complexity making room for the integration of a 64-tap echo canceller together with a voice activity detector.
Research in speech communication.
Flanagan, J
1995-10-24
Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.
Tsai, Ching-Shu; Chen, Vincent Chin-Hung; Yang, Yao-Hsu; Hung, Tai-Hsin; Lu, Mong-Liang; Huang, Kuo-You; Gossop, Michael
2017-01-01
Manifestations of Mycoplasma pneumoniae infection can range from self-limiting upper respiratory symptoms to various neurological complications, including speech and language impairment. But an association between Mycoplasma pneumoniae infection and speech and language impairment has not been sufficiently explored. In this study, we aim to investigate the association between Mycoplasma pneumoniae infection and subsequent speech and language impairment in a nationwide population-based sample using Taiwan's National Health Insurance Research Database. We identified 5,406 children with Mycoplasma pneumoniae infection (International Classification of Disease, Revision 9, Clinical Modification code 4830) and compared to 21,624 age-, sex-, urban- and income-matched controls on subsequent speech and language impairment. The mean follow-up interval for all subjects was 6.44 years (standard deviation = 2.42 years); the mean latency period between the initial Mycoplasma pneumoniae infection and presence of speech and language impairment was 1.96 years (standard deviation = 1.64 years). The results showed that Mycoplasma pneumoniae infection was significantly associated with greater incidence of speech and language impairment [hazard ratio (HR) = 1.49, 95% CI: 1.23-1.80]. In addition, significantly increased hazard ratio of subsequent speech and language impairment in the groups younger than 6 years old and no significant difference in the groups over the age of 6 years were found (HR = 1.43, 95% CI:1.09-1.88 for age 0-3 years group; HR = 1.67, 95% CI: 1.25-2.23 for age 4-5 years group; HR = 1.14, 95% CI: 0.54-2.39 for age 6-7 years group; and HR = 0.83, 95% CI:0.23-2.92 for age 8-18 years group). In conclusion, Mycoplasma pneumoniae infection is temporally associated with incident speech and language impairment.
Wireless communication and their mathematics
NASA Astrophysics Data System (ADS)
Komaki, Shozo
2015-05-01
Mobile phone and smart phone are penetrating into social use. To develop these system, various type of theoretical works based on mathematics are done, such as radio propagation theory, traffic theory, security coding and wireless device etc. In this speech, I will mention about the related mathematics and problems in it.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.
The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation maymore » decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.« less
Vector Sum Excited Linear Prediction (VSELP) speech coding at 4.8 kbps
NASA Technical Reports Server (NTRS)
Gerson, Ira A.; Jasiuk, Mark A.
1990-01-01
Code Excited Linear Prediction (CELP) speech coders exhibit good performance at data rates as low as 4800 bps. The major drawback to CELP type coders is their larger computational requirements. The Vector Sum Excited Linear Prediction (VSELP) speech coder utilizes a codebook with a structure which allows for a very efficient search procedure. Other advantages of the VSELP codebook structure is discussed and a detailed description of a 4.8 kbps VSELP coder is given. This coder is an improved version of the VSELP algorithm, which finished first in the NSA's evaluation of the 4.8 kbps speech coders. The coder uses a subsample resolution single tap long term predictor, a single VSELP excitation codebook, a novel gain quantizer which is robust to channel errors, and a new adaptive pre/postfilter arrangement.
Development of a good-quality speech coder for transmission over noisy channels at 2.4 kb/s
NASA Astrophysics Data System (ADS)
Viswanathan, V. R.; Berouti, M.; Higgins, A.; Russell, W.
1982-03-01
This report describes the development, study, and experimental results of a 2.4 kb/s speech coder called harmonic deviations (HDV) vocoder, which transmits good-quality speech over noisy channels with bit-error rates of up to 1%. The HDV coder is based on the linear predictive coding (LPC) vocoder, and it transmits additional information over and above the data transmitted by the LPC vocoder, in the form of deviations between the speech spectrum and the LPC all-pole model spectrum at a selected set of frequencies. At the receiver, the spectral deviations are used to generate the excitation signal for the all-pole synthesis filter. The report describes and compares several methods for extracting the spectral deviations from the speech signal and for encoding them. To limit the bit-rate of the HDV coder to 2.4 kb/s the report discusses several methods including orthogonal transformation and minimum-mean-square-error scalar quantization of log area ratios, two-stage vector-scalar quantization, and variable frame rate transmission. The report also presents the results of speech-quality optimization of the HDV coder at 2.4 kb/s.
Xiao, Bo; Huang, Chewei; Imel, Zac E; Atkins, David C; Georgiou, Panayiotis; Narayanan, Shrikanth S
2016-04-01
Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy-a key therapy quality index-from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist's language cues. We employed Maximum Entropy models, Maximum Likelihood language models, and a Lattice Rescoring method to characterize high vs. low empathic language. We estimated therapy-session level empathy codes using utterance level evidence obtained from these models. Our experiments showed that the fully automated system achieved a correlation of 0.643 between expert annotated empathy codes and machine-derived estimations, and an accuracy of 81% in classifying high vs. low empathy, in comparison to a 0.721 correlation and 86% accuracy in the oracle setting using manual transcripts. The results show that the system provides useful information that can contribute to automatic quality insurance and therapist training.
Xiao, Bo; Huang, Chewei; Imel, Zac E.; Atkins, David C.; Georgiou, Panayiotis; Narayanan, Shrikanth S.
2016-01-01
Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy—a key therapy quality index—from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist's language cues. We employed Maximum Entropy models, Maximum Likelihood language models, and a Lattice Rescoring method to characterize high vs. low empathic language. We estimated therapy-session level empathy codes using utterance level evidence obtained from these models. Our experiments showed that the fully automated system achieved a correlation of 0.643 between expert annotated empathy codes and machine-derived estimations, and an accuracy of 81% in classifying high vs. low empathy, in comparison to a 0.721 correlation and 86% accuracy in the oracle setting using manual transcripts. The results show that the system provides useful information that can contribute to automatic quality insurance and therapist training. PMID:28286867
Modelling the Architecture of Phonetic Plans: Evidence from Apraxia of Speech
ERIC Educational Resources Information Center
Ziegler, Wolfram
2009-01-01
In theories of spoken language production, the gestural code prescribing the movements of the speech organs is usually viewed as a linear string of holistic, encapsulated, hard-wired, phonetic plans, e.g., of the size of phonemes or syllables. Interactions between phonetic units on the surface of overt speech are commonly attributed to either the…
Do North Carolina Students Have Freedom of Speech? A Review of Campus Speech Codes
ERIC Educational Resources Information Center
Robinson, Jenna Ashley
2010-01-01
America's colleges and universities are supposed to be strongholds of classically liberal ideals, including the protection of individual rights and openness to debate and inquiry. Too often, this is not the case. Across the country, universities deny students and faculty their fundamental rights to freedom of speech and expression. The report…
ERIC Educational Resources Information Center
Studdert-Kennedy, Michael, Ed.; O'Brien, Nancy, Ed.
Prepared as part of a regular series on the status and progress of studies on the nature of speech, instrumentation for its evaluation, and practical applications for speech research, this compilation contains 14 reports. Topics covered in the reports include the following: (1) phonetic coding and order memory in relation to reading proficiency,…
Dilley, Laura C.; Wieland, Elizabeth A.; Gamache, Jessica L.; McAuley, J. Devin; Redford, Melissa A.
2013-01-01
Purpose As children mature, changes in voice spectral characteristics covary with changes in speech, language, and behavior. Spectral characteristics were manipulated to alter the perceived ages of talkers’ voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Method Speech was modified by lowering formants and fundamental frequency, for 5-year-old children’s utterances, or raising them, for adult caregivers’ utterances. Next, participants differing in awareness of the manipulation (Exp. 1a) or amount of speech-language training (Exp. 1b) made judgments of prosodic, segmental, and talker attributes. Exp. 2 investigated the effects of spectral modification on intelligibility. Finally, in Exp. 3 trained analysts used formal prosody coding to assess prosodic characteristics of spectrally-modified and unmodified speech. Results Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Conclusions Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work. PMID:23275414
Research in speech communication.
Flanagan, J
1995-01-01
Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker. Images Fig. 1 Fig. 2 Fig. 5 Fig. 8 Fig. 11 Fig. 12 Fig. 13 PMID:7479806
Digitised evaluation of speech intelligibility using vowels in maxillectomy patients.
Sumita, Y I; Hattori, M; Murase, M; Elbashti, M E; Taniguchi, H
2018-03-01
Among the functional disabilities that patients face following maxillectomy, speech impairment is a major factor influencing quality of life. Proper rehabilitation of speech, which may include prosthodontic and surgical treatments and speech therapy, requires accurate evaluation of speech intelligibility (SI). A simple, less time-consuming yet accurate evaluation is desirable both for maxillectomy patients and the various clinicians providing maxillofacial treatment. This study sought to determine the utility of digital acoustic analysis of vowels for the prediction of SI in maxillectomy patients, based on a comprehensive understanding of speech production in the vocal tract of maxillectomy patients and its perception. Speech samples were collected from 33 male maxillectomy patients (mean age 57.4 years) in two conditions, without and with a maxillofacial prosthesis, and formant data for the vowels /a/,/e/,/i/,/o/, and /u/ were calculated based on linear predictive coding. The frequency range of formant 2 (F2) was determined by differences between the minimum and maximum frequency. An SI test was also conducted to reveal the relationship between SI score and F2 range. Statistical analyses were applied. F2 range and SI score were significantly different between the two conditions without and with a prosthesis (both P < .0001). F2 range was significantly correlated with SI score in both the conditions (Spearman's r = .843, P < .0001; r = .832, P < .0001, respectively). These findings indicate that calculating the F2 range from 5 vowels has clinical utility for the prediction of SI after maxillectomy. © 2017 John Wiley & Sons Ltd.
Effects of irrelevant sounds on phonological coding in reading comprehension and short-term memory.
Boyle, R; Coltheart, V
1996-05-01
The effects of irrelevant sounds on reading comprehension and short-term memory were studied in two experiments. In Experiment 1, adults judged the acceptability of written sentences during irrelevant speech, accompanied and unaccompanied singing, instrumental music, and in silence. Sentences varied in syntactic complexity: Simple sentences contained a right-branching relative clause (The applause pleased the woman that gave the speech) and syntactically complex sentences included a centre-embedded relative clause (The hay that the farmer stored fed the hungry animals). Unacceptable sentences either sounded acceptable (The dog chased the cat that eight up all his food) or did not (The man praised the child that sight up his spinach). Decision accuracy was impaired by syntactic complexity but not by irrelevant sounds. Phonological coding was indicated by increased errors on unacceptable sentences that sounded correct. These errors rates were unaffected by irrelevant sounds. Experiment 2 examined effects of irrelevant sounds on ordered recall of phonologically similar and dissimilar word lists. Phonological similarity impaired recall. Irrelevant speech reduced recall but did not interact with phonological similarity. The results of these experiments question assumptions about the relationship between speech input and phonological coding in reading and the short-term store.
One Speaker, Two Languages. Cross-Disciplinary Perspectives on Code-Switching.
ERIC Educational Resources Information Center
Milroy, Lesley, Ed.; Muysken, Pieter, Ed.
Fifteen articles review code-switching in the four major areas: policy implications in specific institutional and community settings; perspectives of social theory of code-switching as a form of speech behavior in particular social contexts; the grammatical analysis of code-switching, including factors that constrain switching even within a…
Speech perception of young children using nucleus 22-channel or CLARION cochlear implants.
Young, N M; Grohne, K M; Carrasco, V N; Brown, C
1999-04-01
This study compares the auditory perceptual skill development of 23 congenitally deaf children who received the Nucleus 22-channel cochlear implant with the SPEAK speech coding strategy, and 20 children who received the CLARION Multi-Strategy Cochlear Implant with the Continuous Interleaved Sampler (CIS) speech coding strategy. All were under 5 years old at implantation. Preimplantation, there were no significant differences between the groups in age, length of hearing aid use, or communication mode. Auditory skills were assessed at 6 months and 12 months after implantation. Postimplantation, the mean scores on all speech perception tests were higher for the Clarion group. These differences were statistically significant for the pattern perception and monosyllable subtests of the Early Speech Perception battery at 6 months, and for the Glendonald Auditory Screening Procedure at 12 months. Multiple regression analysis revealed that device type accounted for the greatest variance in performance after 12 months of implant use. We conclude that children using the CIS strategy implemented in the Clarion implant may develop better auditory perceptual skills during the first year postimplantation than children using the SPEAK strategy with the Nucleus device.
Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals
Haro, Martín; Serrà, Joan; Herrera, Perfecto; Corral, Álvaro
2012-01-01
Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources. PMID:22479497
ERIC Educational Resources Information Center
Pattamadilok, Chotiga; Nelis, Aubéline; Kolinsky, Régine
2014-01-01
Studies on proficient readers showed that speech processing is affected by knowledge of the orthographic code. Yet, the automaticity of the orthographic influence depends on task demand. Here, we addressed this automaticity issue in normal and dyslexic adult readers by comparing the orthographic effects obtained in two speech processing tasks that…
ERIC Educational Resources Information Center
Dodd, Barbara; McIntosh, Beth; Erdener, Dogu; Burnham, Denis
2008-01-01
An example of the auditory-visual illusion in speech perception, first described by McGurk and MacDonald, is the perception of [ta] when listeners hear [pa] in synchrony with the lip movements for [ka]. One account of the illusion is that lip-read and heard speech are combined in an articulatory code since people who mispronounce words respond…
Naval Computer-Based Instruction: Cost, Implementation and Effectiveness Issues.
1988-03-01
logical follow on to MITIPAC and are an attempt to use some artificial intelligence (AI) techniques with computer-based training. A good intelligent ...principles of steam plant operation and maintenance. Steamer was written in LISP on a LISP machine in an attempt to use artificial intelligence . "What... Artificial Intelligence and Speech Technology", Electronic Learning, September 1987. Montague, William. E., code 5, Navy Personnel Research and
Phonemes: Lexical access and beyond.
Kazanina, Nina; Bowers, Jeffrey S; Idsardi, William
2018-04-01
Phonemes play a central role in traditional theories as units of speech perception and access codes to lexical representations. Phonemes have two essential properties: they are 'segment-sized' (the size of a consonant or vowel) and abstract (a single phoneme may be have different acoustic realisations). Nevertheless, there is a long history of challenging the phoneme hypothesis, with some theorists arguing for differently sized phonological units (e.g. features or syllables) and others rejecting abstract codes in favour of representations that encode detailed acoustic properties of the stimulus. The phoneme hypothesis is the minority view today. We defend the phoneme hypothesis in two complementary ways. First, we show that rejection of phonemes is based on a flawed interpretation of empirical findings. For example, it is commonly argued that the failure to find acoustic invariances for phonemes rules out phonemes. However, the lack of invariance is only a problem on the assumption that speech perception is a bottom-up process. If learned sublexical codes are modified by top-down constraints (which they are), then this argument loses all force. Second, we provide strong positive evidence for phonemes on the basis of linguistic data. Almost all findings that are taken (incorrectly) as evidence against phonemes are based on psycholinguistic studies of single words. However, phonemes were first introduced in linguistics, and the best evidence for phonemes comes from linguistic analyses of complex word forms and sentences. In short, the rejection of phonemes is based on a false analysis and a too-narrow consideration of the relevant data.
Deriving Word Order in Code-Switching: Feature Inheritance and Light Verbs
ERIC Educational Resources Information Center
Shim, Ji Young
2013-01-01
This dissertation investigates code-switching (CS), the concurrent use of more than one language in conversation, commonly observed in bilingual speech. Assuming that code-switching is subject to universal principles, just like monolingual grammar, the dissertation provides a principled account of code-switching, with particular emphasis on OV~VO…
Cracking the Language Code: Neural Mechanisms Underlying Speech Parsing
McNealy, Kristin; Mazziotta, John C.; Dapretto, Mirella
2013-01-01
Word segmentation, detecting word boundaries in continuous speech, is a critical aspect of language learning. Previous research in infants and adults demonstrated that a stream of speech can be readily segmented based solely on the statistical and speech cues afforded by the input. Using functional magnetic resonance imaging (fMRI), the neural substrate of word segmentation was examined on-line as participants listened to three streams of concatenated syllables, containing either statistical regularities alone, statistical regularities and speech cues, or no cues. Despite the participants’ inability to explicitly detect differences between the speech streams, neural activity differed significantly across conditions, with left-lateralized signal increases in temporal cortices observed only when participants listened to streams containing statistical regularities, particularly the stream containing speech cues. In a second fMRI study, designed to verify that word segmentation had implicitly taken place, participants listened to trisyllabic combinations that occurred with different frequencies in the streams of speech they just heard (“words,” 45 times; “partwords,” 15 times; “nonwords,” once). Reliably greater activity in left inferior and middle frontal gyri was observed when comparing words with partwords and, to a lesser extent, when comparing partwords with nonwords. Activity in these regions, taken to index the implicit detection of word boundaries, was positively correlated with participants’ rapid auditory processing skills. These findings provide a neural signature of on-line word segmentation in the mature brain and an initial model with which to study developmental changes in the neural architecture involved in processing speech cues during language learning. PMID:16855090
Incorporating Speech Recognition into a Natural User Interface
NASA Technical Reports Server (NTRS)
Chapa, Nicholas
2017-01-01
The Augmented/ Virtual Reality (AVR) Lab has been working to study the applicability of recent virtual and augmented reality hardware and software to KSC operations. This includes the Oculus Rift, HTC Vive, Microsoft HoloLens, and Unity game engine. My project in this lab is to integrate voice recognition and voice commands into an easy to modify system that can be added to an existing portion of a Natural User Interface (NUI). A NUI is an intuitive and simple to use interface incorporating visual, touch, and speech recognition. The inclusion of speech recognition capability will allow users to perform actions or make inquiries using only their voice. The simplicity of needing only to speak to control an on-screen object or enact some digital action means that any user can quickly become accustomed to using this system. Multiple programs were tested for use in a speech command and recognition system. Sphinx4 translates speech to text using a Hidden Markov Model (HMM) based Language Model, an Acoustic Model, and a word Dictionary running on Java. PocketSphinx had similar functionality to Sphinx4 but instead ran on C. However, neither of these programs were ideal as building a Java or C wrapper slowed performance. The most ideal speech recognition system tested was the Unity Engine Grammar Recognizer. A Context Free Grammar (CFG) structure is written in an XML file to specify the structure of phrases and words that will be recognized by Unity Grammar Recognizer. Using Speech Recognition Grammar Specification (SRGS) 1.0 makes modifying the recognized combinations of words and phrases very simple and quick to do. With SRGS 1.0, semantic information can also be added to the XML file, which allows for even more control over how spoken words and phrases are interpreted by Unity. Additionally, using a CFG with SRGS 1.0 produces a Finite State Machine (FSM) functionality limiting the potential for incorrectly heard words or phrases. The purpose of my project was to investigate options for a Speech Recognition System. To that end I attempted to integrate Sphinx4 into a user interface. Sphinx4 had great accuracy and is the only free program able to perform offline speech dictation. However it had a limited dictionary of words that could be recognized, single syllable words were almost impossible for it to hear, and since it ran on Java it could not be integrated into the Unity based NUI. PocketSphinx ran much faster than Sphinx4 which would've made it ideal as a plugin to the Unity NUI, unfortunately creating a C# wrapper for the C code made the program unusable with Unity due to the wrapper slowing code execution and class files becoming unreachable. Unity Grammar Recognizer is the ideal speech recognition interface, it is flexible in recognizing multiple variations of the same command. It is also the most accurate program in recognizing speech due to using an XML grammar to specify speech structure instead of relying solely on a Dictionary and Language model. The Unity Grammar Recognizer will be used with the NUI for these reasons as well as being written in C# which further simplifies the incorporation.
JND measurements of the speech formants parameters and its implication in the LPC pole quantization
NASA Astrophysics Data System (ADS)
Orgad, Yaakov
1988-08-01
The inherent sensitivity of auditory perception is explicitly used with the objective of designing an efficient speech encoder. Speech can be modelled by a filter representing the vocal tract shape that is driven by an excitation signal representing glottal air flow. This work concentrates on the filter encoding problem, assuming that excitation signal encoding is optimal. Linear predictive coding (LPC) techniques were used to model a short speech segment by an all-pole filter; each pole was directly related to the speech formants. Measurements were made of the auditory just noticeable difference (JND) corresponding to the natural speech formants, with the LPC filter poles as the best candidates to represent the speech spectral envelope. The JND is the maximum precision required in speech quantization; it was defined on the basis of the shift of one pole parameter of a single frame of a speech segment, necessary to induce subjective perception of the distortion, with .75 probability. The average JND in LPC filter poles in natural speech was found to increase with increasing pole bandwidth and, to a lesser extent, frequency. The JND measurements showed a large spread of the residuals around the average values, indicating that inter-formant coupling and, perhaps, other, not yet fully understood, factors were not taken into account at this stage of the research. A future treatment should consider these factors. The average JNDs obtained in this work were used to design pole quantization tables for speech coding and provided a better bit-rate than the standard quantizer of reflection coefficient; a 30-bits-per-frame pole quantizer yielded a speech quality similar to that obtained with a standard 41-bits-per-frame reflection coefficient quantizer. Owing to the complexity of the numerical root extraction system, the practical implementation of the pole quantization approach remains to be proved.
Language choice in bimodal bilingual development.
Lillo-Martin, Diane; de Quadros, Ronice M; Chen Pichler, Deborah; Fieldsteel, Zoe
2014-01-01
Bilingual children develop sensitivity to the language used by their interlocutors at an early age, reflected in differential use of each language by the child depending on their interlocutor. Factors such as discourse context and relative language dominance in the community may mediate the degree of language differentiation in preschool age children. Bimodal bilingual children, acquiring both a sign language and a spoken language, have an even more complex situation. Their Deaf parents vary considerably in access to the spoken language. Furthermore, in addition to code-mixing and code-switching, they use code-blending-expressions in both speech and sign simultaneously-an option uniquely available to bimodal bilinguals. Code-blending is analogous to code-switching sociolinguistically, but is also a way to communicate without suppressing one language. For adult bimodal bilinguals, complete suppression of the non-selected language is cognitively demanding. We expect that bimodal bilingual children also find suppression difficult, and use blending rather than suppression in some contexts. We also expect relative community language dominance to be a factor in children's language choices. This study analyzes longitudinal spontaneous production data from four bimodal bilingual children and their Deaf and hearing interlocutors. Even at the earliest observations, the children produced more signed utterances with Deaf interlocutors and more speech with hearing interlocutors. However, while three of the four children produced >75% speech alone in speech target sessions, they produced <25% sign alone in sign target sessions. All four produced bimodal utterances in both, but more frequently in the sign sessions, potentially because they find suppression of the dominant language more difficult. Our results indicate that these children are sensitive to the language used by their interlocutors, while showing considerable influence from the dominant community language.
Language choice in bimodal bilingual development
Lillo-Martin, Diane; de Quadros, Ronice M.; Chen Pichler, Deborah; Fieldsteel, Zoe
2014-01-01
Bilingual children develop sensitivity to the language used by their interlocutors at an early age, reflected in differential use of each language by the child depending on their interlocutor. Factors such as discourse context and relative language dominance in the community may mediate the degree of language differentiation in preschool age children. Bimodal bilingual children, acquiring both a sign language and a spoken language, have an even more complex situation. Their Deaf parents vary considerably in access to the spoken language. Furthermore, in addition to code-mixing and code-switching, they use code-blending—expressions in both speech and sign simultaneously—an option uniquely available to bimodal bilinguals. Code-blending is analogous to code-switching sociolinguistically, but is also a way to communicate without suppressing one language. For adult bimodal bilinguals, complete suppression of the non-selected language is cognitively demanding. We expect that bimodal bilingual children also find suppression difficult, and use blending rather than suppression in some contexts. We also expect relative community language dominance to be a factor in children's language choices. This study analyzes longitudinal spontaneous production data from four bimodal bilingual children and their Deaf and hearing interlocutors. Even at the earliest observations, the children produced more signed utterances with Deaf interlocutors and more speech with hearing interlocutors. However, while three of the four children produced >75% speech alone in speech target sessions, they produced <25% sign alone in sign target sessions. All four produced bimodal utterances in both, but more frequently in the sign sessions, potentially because they find suppression of the dominant language more difficult. Our results indicate that these children are sensitive to the language used by their interlocutors, while showing considerable influence from the dominant community language. PMID:25368591
Effects of prior information on decoding degraded speech: an fMRI study.
Clos, Mareike; Langner, Robert; Meyer, Martin; Oechslin, Mathias S; Zilles, Karl; Eickhoff, Simon B
2014-01-01
Expectations and prior knowledge are thought to support the perceptual analysis of incoming sensory stimuli, as proposed by the predictive-coding framework. The current fMRI study investigated the effect of prior information on brain activity during the decoding of degraded speech stimuli. When prior information enabled the comprehension of the degraded sentences, the left middle temporal gyrus and the left angular gyrus were activated, highlighting a role of these areas in meaning extraction. In contrast, the activation of the left inferior frontal gyrus (area 44/45) appeared to reflect the search for meaningful information in degraded speech material that could not be decoded because of mismatches with the prior information. Our results show that degraded sentences evoke instantaneously different percepts and activation patterns depending on the type of prior information, in line with prediction-based accounts of perception. Copyright © 2012 Wiley Periodicals, Inc.
“Down the Language Rabbit Hole with Alice”: A Case Study of a Deaf Girl with a Cochlear Implant
Andrews, Jean F.; Dionne, Vickie
2011-01-01
Alice, a deaf girl who was implanted after age three years of age was exposed to four weeks of storybook sessions conducted in American Sign Language (ASL) and speech (English). Two research questions were address: (1) how did she use her sign bimodal/bilingualism, codeswitching, and code mixing during reading activities and (2) what sign bilingual code-switching and code-mixing strategies did she use while attending to stories delivered under two treatments: ASL only and speech only. Retelling scores were collected to determine the type and frequency of her codeswitching/codemixing strategies between both languages after Alice was read to a story in ASL and in spoken English. Qualitative descriptive methods were utilized. Teacher, clinician and student transcripts of the reading and retelling sessions were recorded. Results showed Alice frequently used codeswitching and codeswitching strategies while retelling the stories retold under both treatments. Alice increased in her speech production retellings of the stories under both the ASL storyreading and spoken English-only reading of the story. The ASL storyreading did not decrease Alice's retelling scores in spoken English. Professionals are encouraged to consider the benefits of early sign bimodal/bilingualism to enhance the overall speech, language and reading proficiency of deaf children with cochlear implants. PMID:22135677
Ethnography of Communication: Cultural Codes and Norms.
ERIC Educational Resources Information Center
Carbaugh, Donal
The primary tasks of the ethnographic researcher are to discover, describe, and comparatively analyze different speech communities' ways of speaking. Two general abstractions occurring in ethnographic analyses are normative and cultural. Communicative norms are formulated in analyzing and explaining the "patterned use of speech."…
Role of N-Methyl-D-Aspartate Receptors in Action-Based Predictive Coding Deficits in Schizophrenia.
Kort, Naomi S; Ford, Judith M; Roach, Brian J; Gunduz-Bruce, Handan; Krystal, John H; Jaeger, Judith; Reinhart, Robert M G; Mathalon, Daniel H
2017-03-15
Recent theoretical models of schizophrenia posit that dysfunction of the neural mechanisms subserving predictive coding contributes to symptoms and cognitive deficits, and this dysfunction is further posited to result from N-methyl-D-aspartate glutamate receptor (NMDAR) hypofunction. Previously, by examining auditory cortical responses to self-generated speech sounds, we demonstrated that predictive coding during vocalization is disrupted in schizophrenia. To test the hypothesized contribution of NMDAR hypofunction to this disruption, we examined the effects of the NMDAR antagonist, ketamine, on predictive coding during vocalization in healthy volunteers and compared them with the effects of schizophrenia. In two separate studies, the N1 component of the event-related potential elicited by speech sounds during vocalization (talk) and passive playback (listen) were compared to assess the degree of N1 suppression during vocalization, a putative measure of auditory predictive coding. In the crossover study, 31 healthy volunteers completed two randomly ordered test days, a saline day and a ketamine day. Event-related potentials during the talk/listen task were obtained before infusion and during infusion on both days, and N1 amplitudes were compared across days. In the case-control study, N1 amplitudes from 34 schizophrenia patients and 33 healthy control volunteers were compared. N1 suppression to self-produced vocalizations was significantly and similarly diminished by ketamine (Cohen's d = 1.14) and schizophrenia (Cohen's d = .85). Disruption of NMDARs causes dysfunction in predictive coding during vocalization in a manner similar to the dysfunction observed in schizophrenia patients, consistent with the theorized contribution of NMDAR hypofunction to predictive coding deficits in schizophrenia. Copyright © 2016 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Ortwein, Heiderose; Benz, Alexander; Carl, Petra; Huwendiek, Sören; Pander, Tanja; Kiessling, Claudia
2017-02-01
To investigate whether the Verona Coding Definitions of Emotional Sequences to code health providers' responses (VR-CoDES-P) can be used for assessment of medical students' responses to patients' cues and concerns provided in written case vignettes. Student responses in direct speech to patient cues and concerns were analysed in 21 different case scenarios using VR-CoDES-P. A total of 977 student responses were available for coding, and 857 responses were codable with the VR-CoDES-P. In 74.6% of responses, the students used either a "reducing space" statement only or a "providing space" statement immediately followed by a "reducing space" statement. Overall, the most frequent response was explicit information advice (ERIa) followed by content exploring (EPCEx) and content acknowledgement (EPCAc). VR-CoDES-P were applicable to written responses of medical students when they were phrased in direct speech. The application of VR-CoDES-P is reliable and feasible when using the differentiation of "providing" and "reducing space" responses. Communication strategies described by students in non-direct speech were difficult to code and produced many missings. VR-CoDES-P are useful for analysis of medical students' written responses when focusing on emotional issues. Students need precise instructions for their response in the given test format. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Leblanc, Linda A; Geiger, Kaneen B; Sautter, Rachael A; Sidener, Tina M
2007-01-01
The Natural Language Paradigm (NLP) has proven effective in increasing spontaneous verbalizations for children with autism. This study investigated the use of NLP with older adults with cognitive impairments served at a leisure-based adult day program for seniors. Three individuals with limited spontaneous use of functional language participated in a multiple baseline design across participants. Data were collected on appropriate and inappropriate vocalizations with appropriate vocalizations coded as prompted or unprompted during baseline and treatment sessions. All participants experienced increases in appropriate speech during NLP with variable response patterns. Additionally, the two participants with substantial inappropriate vocalizations showed decreases in inappropriate speech. Implications for intervention in day programs are discussed.
Ackermann, Hermann; Mathiak, Klaus; Riecker, Axel
2007-01-01
A classical tenet of clinical neurology proposes that cerebellar disorders may give rise to speech motor disorders (ataxic dysarthria), but spare perceptual and cognitive aspects of verbal communication. During the past two decades, however, a variety of higher-order deficits of speech production, e.g., more or less exclusive agrammatism, amnesic or transcortical motor aphasia, have been noted in patients with vascular cerebellar lesions, and transient mutism following resection of posterior fossa tumors in children may develop into similar constellations. Perfusion studies provided evidence for cerebello-cerebral diaschisis as a possible pathomechanism in these instances. Tight functional connectivity between the language-dominant frontal lobe and the contralateral cerebellar hemisphere represents a prerequisite of such long-distance effects. Recent functional imaging data point at a contribution of the right cerebellar hemisphere, concomitant with language-dominant dorsolateral and medial frontal areas, to the temporal organization of a prearticulatory verbal code ('inner speech'), in terms of the sequencing of syllable strings at a speaker's habitual speech rate. Besides motor control, this network also appears to be engaged in executive functions, e.g., subvocal rehearsal mechanisms of verbal working memory, and seems to be recruited during distinct speech perception tasks. Taken together, thus, a prearticulatory verbal code bound to reciprocal right cerebellar/left frontal interactions might represent a common platform for a variety of cerebellar engagements in cognitive functions. The distinct computational operation provided by cerebellar structures within this framework appears to be the concatenation of syllable strings into coarticulated sequences.
Interactive MPEG-4 low-bit-rate speech/audio transmission over the Internet
NASA Astrophysics Data System (ADS)
Liu, Fang; Kim, JongWon; Kuo, C.-C. Jay
1999-11-01
The recently developed MPEG-4 technology enables the coding and transmission of natural and synthetic audio-visual data in the form of objects. In an effort to extend the object-based functionality of MPEG-4 to real-time Internet applications, architectural prototypes of multiplex layer and transport layer tailored for transmission of MPEG-4 data over IP are under debate among Internet Engineering Task Force (IETF), and MPEG-4 systems Ad Hoc group. In this paper, we present an architecture for interactive MPEG-4 speech/audio transmission system over the Internet. It utilities a framework of Real Time Streaming Protocol (RTSP) over Real-time Transport Protocol (RTP) to provide controlled, on-demand delivery of real time speech/audio data. Based on a client-server model, a couple of low bit-rate bit streams (real-time speech/audio, pre- encoded speech/audio) are multiplexed and transmitted via a single RTP channel to the receiver. The MPEG-4 Scene Description (SD) and Object Descriptor (OD) bit streams are securely sent through the RTSP control channel. Upon receiving, an initial MPEG-4 audio- visual scene is constructed after de-multiplexing, decoding of bit streams, and scene composition. A receiver is allowed to manipulate the initial audio-visual scene presentation locally, or interactively arrange scene changes by sending requests to the server. A server may also choose to update the client with new streams and list of contents for user selection.
Voice-processing technologies--their application in telecommunications.
Wilpon, J G
1995-01-01
As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper. Images Fig. 1 PMID:7479815
Speech transport for packet telephony and voice over IP
NASA Astrophysics Data System (ADS)
Baker, Maurice R.
1999-11-01
Recent advances in packet switching, internetworking, and digital signal processing technologies have converged to allow realizable practical implementations of packet telephony systems. This paper provides a tutorial on transmission engineering for packet telephony covering the topics of speech coding/decoding, speech packetization, packet data network transport, and impairments which may negatively impact end-to-end system quality. Particular emphasis is placed upon Voice over Internet Protocol given the current popularity and ubiquity of IP transport.
Strahl, Stefan; Mertins, Alfred
2008-07-18
Evidence that neurosensory systems use sparse signal representations as well as improved performance of signal processing algorithms using sparse signal models raised interest in sparse signal coding in the last years. For natural audio signals like speech and environmental sounds, gammatone atoms have been derived as expansion functions that generate a nearly optimal sparse signal model (Smith, E., Lewicki, M., 2006. Efficient auditory coding. Nature 439, 978-982). Furthermore, gammatone functions are established models for the human auditory filters. Thus far, a practical application of a sparse gammatone signal model has been prevented by the fact that deriving the sparsest representation is, in general, computationally intractable. In this paper, we applied an accelerated version of the matching pursuit algorithm for gammatone dictionaries allowing real-time and large data set applications. We show that a sparse signal model in general has advantages in audio coding and that a sparse gammatone signal model encodes speech more efficiently in terms of sparseness than a sparse modified discrete cosine transform (MDCT) signal model. We also show that the optimal gammatone parameters derived for English speech do not match the human auditory filters, suggesting for signal processing applications to derive the parameters individually for each applied signal class instead of using psychometrically derived parameters. For brain research, it means that care should be taken with directly transferring findings of optimality for technical to biological systems.
Neuroscience-inspired computational systems for speech recognition under noisy conditions
NASA Astrophysics Data System (ADS)
Schafer, Phillip B.
Humans routinely recognize speech in challenging acoustic environments with background music, engine sounds, competing talkers, and other acoustic noise. However, today's automatic speech recognition (ASR) systems perform poorly in such environments. In this dissertation, I present novel methods for ASR designed to approach human-level performance by emulating the brain's processing of sounds. I exploit recent advances in auditory neuroscience to compute neuron-based representations of speech, and design novel methods for decoding these representations to produce word transcriptions. I begin by considering speech representations modeled on the spectrotemporal receptive fields of auditory neurons. These representations can be tuned to optimize a variety of objective functions, which characterize the response properties of a neural population. I propose an objective function that explicitly optimizes the noise invariance of the neural responses, and find that it gives improved performance on an ASR task in noise compared to other objectives. The method as a whole, however, fails to significantly close the performance gap with humans. I next consider speech representations that make use of spiking model neurons. The neurons in this method are feature detectors that selectively respond to spectrotemporal patterns within short time windows in speech. I consider a number of methods for training the response properties of the neurons. In particular, I present a method using linear support vector machines (SVMs) and show that this method produces spikes that are robust to additive noise. I compute the spectrotemporal receptive fields of the neurons for comparison with previous physiological results. To decode the spike-based speech representations, I propose two methods designed to work on isolated word recordings. The first method uses a classical ASR technique based on the hidden Markov model. The second method is a novel template-based recognition scheme that takes advantage of the neural representation's invariance in noise. The scheme centers on a speech similarity measure based on the longest common subsequence between spike sequences. The combined encoding and decoding scheme outperforms a benchmark system in extremely noisy acoustic conditions. Finally, I consider methods for decoding spike representations of continuous speech. To help guide the alignment of templates to words, I design a syllable detection scheme that robustly marks the locations of syllabic nuclei. The scheme combines SVM-based training with a peak selection algorithm designed to improve noise tolerance. By incorporating syllable information into the ASR system, I obtain strong recognition results in noisy conditions, although the performance in noiseless conditions is below the state of the art. The work presented here constitutes a novel approach to the problem of ASR that can be applied in the many challenging acoustic environments in which we use computer technologies today. The proposed spike-based processing methods can potentially be exploited in effcient hardware implementations and could significantly reduce the computational costs of ASR. The work also provides a framework for understanding the advantages of spike-based acoustic coding in the human brain.
Teaching Speech Organization and Outlining Using a Color-Coded Approach.
ERIC Educational Resources Information Center
Hearn, Ralene
The organization/outlining unit in the basic Public Speaking course can be made more interesting by using a color-coded instructional method that captivates students, facilitates understanding, and provides the opportunity for interesting reinforcement activities. The two part lesson includes a mini-lecture with a color-coded outline and a two…
Will Microfilm and Computers Replace Clippings?
ERIC Educational Resources Information Center
Oppendahl, Alison; And Others
Four speeches are presented, each of which deals with the use of conputers to organize and retrieve news stories. The first speech relates in detail the step-by-step process devised by the "Free Press" in Detroit to analyze, categorize, code, film, process, and retrieve news stories through the use of the electronic film retrieval…
Comparisons of Young Children's Private Speech Profiles: Analogical Versus Nonanalogical Reasoners.
ERIC Educational Resources Information Center
Manning, Brenda H.; White, C. Stephen
The primary intention of this study was to compare private speech profiles of young children classified as analogical reasoners (AR) with young children classified as nonanalogical reasoners (NAR). The secondary purpose was to investigate Berk's (1986) research methodology and categorical scheme for the collection and coding of private speech…
Cultivating American- and Japanese-Style Relatedness through Mother-Child Conversation
ERIC Educational Resources Information Center
Crane, Lauren Shapiro; Fernald, Anne
2017-01-01
This study investigated whether European American and Japanese mothers' speech to preschoolers contained exchange- and alignment-oriented structures that reflect and possibly support culture-specific models of self-other relatedness. In each country 12 mothers were observed in free play with their 3-year-olds. Maternal speech was coded for…
Freedom of Speech Wins in Wisconsin
ERIC Educational Resources Information Center
Downs, Donald Alexander
2006-01-01
One might derive, from the eradication of a particularly heinous speech code, some encouragement that all is not lost in the culture wars. A core of dedicated scholars, working from within, made it obvious, to all but the most radical left, that imposing social justice by restricting thought and expression was a recipe for tyranny. Donald…
Preliminary Analysis of Automatic Speech Recognition and Synthesis Technology.
1983-05-01
16.311 % a. Seale In/Se"l tAL4 lrs e y i s 2 I ROM men "Ig eddiei, m releerla ons leveltc. Ŗ dots ghoeea INDtISTRtAIJ%6LITARY SPEECH SYNTHESIS PRODUCTS...saquence The SC-01 Suech Syntheszer conftains 64 cf, arent poneme~hs which are accessed try A 6-tht code. 1 - the proper sequ.enti omthnatiors of thoe...connected speech input with widely differing emotional states, diverse accents, and substantial nonperiodic background noise input. As noted previously
NASA Astrophysics Data System (ADS)
The present conference on the development status of communications systems in the context of electronic warfare gives attention to topics in spread spectrum code acquisition, digital speech technology, fiber-optics communications, free space optical communications, the networking of HF systems, and applications and evaluation methods for digital speech. Also treated are issues in local area network system design, coding techniques and applications, technology applications for HF systems, receiver technologies, software development status, channel simultion/prediction methods, C3 networking spread spectrum networks, the improvement of communication efficiency and reliability through technical control methods, mobile radio systems, and adaptive antenna arrays. Finally, communications system cost analyses, spread spectrum performance, voice and image coding, switched networks, and microwave GaAs ICs, are considered.
Examining the relationship between comprehension and production processes in code-switched language
Guzzardo Tamargo, Rosa E.; Valdés Kroff, Jorge R.; Dussias, Paola E.
2016-01-01
We employ code-switching (the alternation of two languages in bilingual communication) to test the hypothesis, derived from experience-based models of processing (e.g., Boland, Tanenhaus, Carlson, & Garnsey, 1989; Gennari & MacDonald, 2009), that bilinguals are sensitive to the combinatorial distributional patterns derived from production and that they use this information to guide processing during the comprehension of code-switched sentences. An analysis of spontaneous bilingual speech confirmed the existence of production asymmetries involving two auxiliary + participle phrases in Spanish–English code-switches. A subsequent eye-tracking study with two groups of bilingual code-switchers examined the consequences of the differences in distributional patterns found in the corpus study for comprehension. Participants’ comprehension costs mirrored the production patterns found in the corpus study. Findings are discussed in terms of the constraints that may be responsible for the distributional patterns in code-switching production and are situated within recent proposals of the links between production and comprehension. PMID:28670049
Examining the relationship between comprehension and production processes in code-switched language.
Guzzardo Tamargo, Rosa E; Valdés Kroff, Jorge R; Dussias, Paola E
2016-08-01
We employ code-switching (the alternation of two languages in bilingual communication) to test the hypothesis, derived from experience-based models of processing (e.g., Boland, Tanenhaus, Carlson, & Garnsey, 1989; Gennari & MacDonald, 2009), that bilinguals are sensitive to the combinatorial distributional patterns derived from production and that they use this information to guide processing during the comprehension of code-switched sentences. An analysis of spontaneous bilingual speech confirmed the existence of production asymmetries involving two auxiliary + participle phrases in Spanish-English code-switches. A subsequent eye-tracking study with two groups of bilingual code-switchers examined the consequences of the differences in distributional patterns found in the corpus study for comprehension. Participants' comprehension costs mirrored the production patterns found in the corpus study. Findings are discussed in terms of the constraints that may be responsible for the distributional patterns in code-switching production and are situated within recent proposals of the links between production and comprehension.
IEP goals for school-age children with speech sound disorders.
Farquharson, Kelly; Tambyraja, Sherine R; Justice, Laura M; Redle, Erin E
2014-01-01
The purpose of the current study was to describe the current state of practice for writing Individualized Education Program (IEP) goals for children with speech sound disorders (SSDs). IEP goals for 146 children receiving services for SSDs within public school systems across two states were coded for their dominant theoretical framework and overall quality. A dichotomous scheme was used for theoretical framework coding: cognitive-linguistic or sensory-motor. Goal quality was determined by examining 7 specific indicators outlined by an empirically tested rating tool. In total, 147 long-term and 490 short-term goals were coded. The results revealed no dominant theoretical framework for long-term goals, whereas short-term goals largely reflected a sensory-motor framework. In terms of quality, the majority of speech production goals were functional and generalizable in nature, but were not able to be easily targeted during common daily tasks or by other members of the IEP team. Short-term goals were consistently rated higher in quality domains when compared to long-term goals. The current state of practice for writing IEP goals for children with SSDs indicates that theoretical framework may be eclectic in nature and likely written to support the individual needs of children with speech sound disorders. Further investigation is warranted to determine the relations between goal quality and child outcomes. (1) Identify two predominant theoretical frameworks and discuss how they apply to IEP goal writing. (2) Discuss quality indicators as they relate to IEP goals for children with speech sound disorders. (3) Discuss the relationship between long-term goals level of quality and related theoretical frameworks. (4) Identify the areas in which business-as-usual IEP goals exhibit strong quality.
Evaluation of inner-outer space distinction and verbal hallucinations in schizophrenia.
Stephane, Massoud; Kuskowski, Michael; McClannahan, Kate; Surerus, Christa; Nelson, Katie
2010-09-01
Verbal hallucinations could result from attributing one's own inner speech to another. Inner speech is usually experienced in inner space, whereas hallucinations are often experienced in outer space. To clarify this paradox, we investigated schizophrenia patients' ability to distinguish between speech experienced in inner space, and speech experienced in outer space. 32 schizophrenia patients and 26 matched healthy controls underwent a two-stage experiment. First, they read sentences aloud or silently. Afterwards, they were required to distinguish between the sentences read aloud (experienced in outer space), the sentences read silently (experienced in inner space), and new sentences not previously read (no space coding). The sentences were in the first, second, or third person in equal proportions. Linear mixed models were used to investigate the effects of group, sentence location, pronoun, and hallucinations status. Schizophrenia patients were similar to controls in recognition capacity of sentences without space coding. They exhibited both inner-outer and outer-inner space confusion (they confused silently read sentences for sentences read aloud, and vice versa). Patients who experienced hallucinations inside their head were more likely to have outer-inner space bias. For speech generated by one's own brain, schizophrenia patients have bidirectional failure of inner-outer space distinction (inner-outer and outer-inner space biases); this might explain why hallucinations (abnormal inner speech) could be experienced in outer space. Furthermore, the direction of inner-outer space indistinction could determine the spatial location of the experienced hallucinations (inside or outside the head).
NASA Astrophysics Data System (ADS)
Mapp, Peter
2002-11-01
Although RaSTI is a good indicator of the speech intelligibility capability of auditoria and similar spaces, during the past 2-3 years it has been shown that RaSTI is not a robust predictor of sound system intelligibility performance. Instead, it is now recommended, within both national and international codes and standards, that full STI measurement and analysis be employed. However, new research is reported, that indicates that STI is not as flawless, nor robust as many believe. The paper highlights a number of potential error mechanisms. It is shown that the measurement technique and signal excitation stimulus can have a significant effect on the overall result and accuracy, particularly where DSP-based equipment is employed. It is also shown that in its current state of development, STI is not capable of appropriately accounting for a number of fundamental speech and system attributes, including typical sound system frequency response variations and anomalies. This is particularly shown to be the case when a system is operating under reverberant conditions. Comparisons between actual system measurements and corresponding word score data are reported where errors of up to 50 implications for VA and PA system performance verification will be discussed.
From In-Session Behaviors to Drinking Outcomes: A Causal Chain for Motivational Interviewing
ERIC Educational Resources Information Center
Moyers, Theresa B.; Martin, Tim; Houck, Jon M.; Christopher, Paulette J.; Tonigan, J. Scott
2009-01-01
Client speech in favor of change within motivational interviewing sessions has been linked to treatment outcomes, but a causal chain has not yet been demonstrated. Using a sequential behavioral coding system for client speech, the authors found that, at both the session and utterance levels, specific therapist behaviors predict client change talk.…
ERIC Educational Resources Information Center
Shriberg, Lawrence D.; Paul, Rhea; McSweeny, Jane L.; Klin, Ami; Cohen, Donald J.; Volkmar, Fred R.
2001-01-01
This study compared the speech and prosody-voice profiles for 30 male speakers with either high-functioning autism (HFA) or Asperger syndrome (AS), and 53 typically developing male speakers. Both HFA and AS groups had more residual articulation distortion errors and utterances coded as inappropriate for phrasing, stress, and resonance. AS speakers…
Xiao, Bo; Imel, Zac E.; Georgiou, Panayiotis G.; Atkins, David C.; Narayanan, Shrikanth S.
2015-01-01
The technology for evaluating patient-provider interactions in psychotherapy–observational coding–has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies. PMID:26630392
Modem design for a MOBILESAT terminal
NASA Technical Reports Server (NTRS)
Rice, M.; Miller, M. J.; Cowley, W. G.; Rowe, D.
1990-01-01
The implementation is described of a programmable digital signal processor based system, designed for use as a test bed in the development of a digital modem, codec, and channel simulator. Code was written to configure the system as a 5600 bps or 6600 bps QPSK modem. The test bed is currently being used in an experiment to evaluate the performance of digital speech over shadowed channels in the Australian mobile satellite (MOBILESAT) project.
NASA Astrophysics Data System (ADS)
Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan
2017-02-01
For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.
Neural mechanisms underlying auditory feedback control of speech
Reilly, Kevin J.; Guenther, Frank H.
2013-01-01
The neural substrates underlying auditory feedback control of speech were investigated using a combination of functional magnetic resonance imaging (fMRI) and computational modeling. Neural responses were measured while subjects spoke monosyllabic words under two conditions: (i) normal auditory feedback of their speech, and (ii) auditory feedback in which the first formant frequency of their speech was unexpectedly shifted in real time. Acoustic measurements showed compensation to the shift within approximately 135 ms of onset. Neuroimaging revealed increased activity in bilateral superior temporal cortex during shifted feedback, indicative of neurons coding mismatches between expected and actual auditory signals, as well as right prefrontal and Rolandic cortical activity. Structural equation modeling revealed increased influence of bilateral auditory cortical areas on right frontal areas during shifted speech, indicating that projections from auditory error cells in posterior superior temporal cortex to motor correction cells in right frontal cortex mediate auditory feedback control of speech. PMID:18035557
Decoding Articulatory Features from fMRI Responses in Dorsal Speech Regions.
Correia, Joao M; Jansma, Bernadette M B; Bonte, Milene
2015-11-11
The brain's circuitry for perceiving and producing speech may show a notable level of overlap that is crucial for normal development and behavior. The extent to which sensorimotor integration plays a role in speech perception remains highly controversial, however. Methodological constraints related to experimental designs and analysis methods have so far prevented the disentanglement of neural responses to acoustic versus articulatory speech features. Using a passive listening paradigm and multivariate decoding of single-trial fMRI responses to spoken syllables, we investigated brain-based generalization of articulatory features (place and manner of articulation, and voicing) beyond their acoustic (surface) form in adult human listeners. For example, we trained a classifier to discriminate place of articulation within stop syllables (e.g., /pa/ vs /ta/) and tested whether this training generalizes to fricatives (e.g., /fa/ vs /sa/). This novel approach revealed generalization of place and manner of articulation at multiple cortical levels within the dorsal auditory pathway, including auditory, sensorimotor, motor, and somatosensory regions, suggesting the representation of sensorimotor information. Additionally, generalization of voicing included the right anterior superior temporal sulcus associated with the perception of human voices as well as somatosensory regions bilaterally. Our findings highlight the close connection between brain systems for speech perception and production, and in particular, indicate the availability of articulatory codes during passive speech perception. Sensorimotor integration is central to verbal communication and provides a link between auditory signals of speech perception and motor programs of speech production. It remains highly controversial, however, to what extent the brain's speech perception system actively uses articulatory (motor), in addition to acoustic/phonetic, representations. In this study, we examine the role of articulatory representations during passive listening using carefully controlled stimuli (spoken syllables) in combination with multivariate fMRI decoding. Our approach enabled us to disentangle brain responses to acoustic and articulatory speech properties. In particular, it revealed articulatory-specific brain responses of speech at multiple cortical levels, including auditory, sensorimotor, and motor regions, suggesting the representation of sensorimotor information during passive speech perception. Copyright © 2015 the authors 0270-6474/15/3515015-11$15.00/0.
Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach
NASA Astrophysics Data System (ADS)
Feldbauer, Christian; Kubin, Gernot; Kleijn, W. Bastiaan
2005-12-01
Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel) coding.
NASA Astrophysics Data System (ADS)
Liberman, A. M.
1982-03-01
This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications. Manuscripts cover the following topics: Speech perception and memory coding in relation to reading ability; The use of orthographic structure by deaf adults: Recognition of finger-spelled letters; Exploring the information support for speech; The stream of speech; Using the acoustic signal to make inferences about place and duration of tongue-palate contact. Patterns of human interlimb coordination emerge from the the properties of nonlinear limit cycle oscillatory processes: Theory and data; Motor control: Which themes do we orchestrate? Exploring the nature of motor control in Down's syndrome; Periodicity and auditory memory: A pilot study; Reading skill and language skill: On the role of sign order and morphological structure in memory for American Sign Language sentences; Perception of nasal consonants with special reference to Catalan; and Speech production Characteristics of the hearing impaired.
Smart command recognizer (SCR) - For development, test, and implementation of speech commands
NASA Technical Reports Server (NTRS)
Simpson, Carol A.; Bunnell, John W.; Krones, Robert R.
1988-01-01
The SCR, a rapid prototyping system for the development, testing, and implementation of speech commands in a flight simulator or test aircraft, is described. A single unit performs all functions needed during these three phases of system development, while the use of common software and speech command data structure files greatly reduces the preparation time for successive development phases. As a smart peripheral to a simulation or flight host computer, the SCR interprets the pilot's spoken input and passes command codes to the simulation or flight computer.
Levels of Code Switching on EFL Student's Daily Language; Study of Language Production
ERIC Educational Resources Information Center
Zainuddin
2016-01-01
This study is aimed at describing the levels of code switching on EFL students' daily conversation. The topic is chosen due to the facts that code switching phenomenon are commonly found in daily speech of Indonesian community such as in teenager talks, television serial dialogues and mass media. Therefore, qualitative data were collected by using…
Design and Evaluation of a Cochlear Implant Strategy Based on a “Phantom” Channel
Nogueira, Waldo; Litvak, Leonid M.; Saoji, Aniket A.; Büchner, Andreas
2015-01-01
Unbalanced bipolar stimulation, delivered using charge balanced pulses, was used to produce “Phantom stimulation”, stimulation beyond the most apical contact of a cochlear implant’s electrode array. The Phantom channel was allocated audio frequencies below 300Hz in a speech coding strategy, conveying energy some two octaves lower than the clinical strategy and hence delivering the fundamental frequency of speech and of many musical tones. A group of 12 Advanced Bionics cochlear implant recipients took part in a chronic study investigating the fitting of the Phantom strategy and speech and music perception when using Phantom. The evaluation of speech in noise was performed immediately after fitting Phantom for the first time (Session 1) and after one month of take-home experience (Session 2). A repeated measures of analysis of variance (ANOVA) within factors strategy (Clinical, Phantom) and interaction time (Session 1, Session 2) revealed a significant effect for the interaction time and strategy. Phantom obtained a significant improvement in speech intelligibility after one month of use. Furthermore, a trend towards a better performance with Phantom (48%) with respect to F120 (37%) after 1 month of use failed to reach significance after type 1 error correction. Questionnaire results show a preference for Phantom when listening to music, likely driven by an improved balance between high and low frequencies. PMID:25806818
ERIC Educational Resources Information Center
Galvin, Kathleen M.
This paper focuses on certain approaches which an urban speech department can use as it contributes to the preparation of urban school teachers to communicate effectively with their students. The contents include: "Verbal and Nonverbal Codes," which discusses the teacher as an encoder of verbal messages and emphasizes that teachers must learn to…
Hate Speech, the First Amendment, and Professional Codes of Conduct: Where to Draw the Line?
ERIC Educational Resources Information Center
Mello, Jeffrey A.
2008-01-01
This article presents a teaching case that involves the presentation of an actual incident in which a state commission on judicial performance had to balance a judge's First Amendment rights to protected free speech against his public statements about a societal class/group that were deemed to be derogatory and inflammatory and, hence, cast…
NASA Astrophysics Data System (ADS)
Studdert-Kennedy, M.; Obrien, N.
1983-05-01
This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts cover the following topics: The influence of subcategorical mismatches on lexical access; The Serbo-Croatian orthography constraints the reader to a phonologically analytic strategy; Grammatical priming effects between pronouns and inflected verb forms; Misreadings by beginning readers of Serrbo-Croatian; Bi-alphabetism and work recognition; Orthographic and phonemic coding for word identification: Evidence for Hebrew; Stress and vowel duration effects on syllable recognition; Phonetic and auditory trading relations between acoustic cues in speech perception: Further results; Linguistic coding by deaf children in relation beginning reading success; Determinants of spelling ability in deaf and hearing adults: Access to linguistic structures; A dynamical basis for action systems; On the space-time structure of human interlimb coordination; Some acoustic and physiological observations on diphthongs; Relationship between pitch control and vowel articulation; Laryngeal vibrations: A comparison between high-speed filming and glottographic techniques; Compensatory articulation in hearing impaired speakers: A cinefluorographic study; and Review (Pierre Delattre: Studies in comparative phonetics.)
Fifty years of progress in speech coding standards
NASA Astrophysics Data System (ADS)
Cox, Richard
2004-10-01
Over the past 50 years, speech coding has taken root worldwide. Early applications were for the military and transmission for telephone networks. The military gave equal priority to intelligibility and low bit rate. The telephone network gave priority to high quality and low delay. These illustrate three of the four areas in which requirements must be set for any speech coder application: bit rate, quality, delay, and complexity. While the military could afford relatively expensive terminal equipment for secure communications, the telephone network needed low cost for massive deployment in switches and transmission equipment worldwide. Today speech coders are at the heart of the wireless phones and telephone answering systems we use every day. In addition to the technology and technical invention that has occurred, standards make it possible for all these different systems to interoperate. The primary areas of standardization are the public switched telephone network, wireless telephony, and secure telephony for government and military applications. With the advent of IP telephony there are additional standardization efforts and challenges. In this talk the progress in all areas is reviewed as well as a reflection on Jim Flanagan's impact on this field during the past half century.
Influence of musical training on understanding voiced and whispered speech in noise.
Ruggles, Dorea R; Freyman, Richard L; Oxenham, Andrew J
2014-01-01
This study tested the hypothesis that the previously reported advantage of musicians over non-musicians in understanding speech in noise arises from more efficient or robust coding of periodic voiced speech, particularly in fluctuating backgrounds. Speech intelligibility was measured in listeners with extensive musical training, and in those with very little musical training or experience, using normal (voiced) or whispered (unvoiced) grammatically correct nonsense sentences in noise that was spectrally shaped to match the long-term spectrum of the speech, and was either continuous or gated with a 16-Hz square wave. Performance was also measured in clinical speech-in-noise tests and in pitch discrimination. Musicians exhibited enhanced pitch discrimination, as expected. However, no systematic or statistically significant advantage for musicians over non-musicians was found in understanding either voiced or whispered sentences in either continuous or gated noise. Musicians also showed no statistically significant advantage in the clinical speech-in-noise tests. Overall, the results provide no evidence for a significant difference between young adult musicians and non-musicians in their ability to understand speech in noise.
Lei, Huimeng; Yan, Zhangming; Sun, Xiaohong; Zhang, Yue; Wang, Jianhong; Ma, Caihong; Xu, Qunyuan; Wang, Rui; Jarvis, Erich D; Sun, Zhirong
2017-11-01
Human and several nonhuman species share the rare ability of modifying acoustic and/or syntactic features of sounds produced, i.e. vocal learning, which is the important neurobiological and behavioral substrate of human speech/language. This convergent trait was suggested to be associated with significant genomic convergence and best manifested at the ROBO-SLIT axon guidance pathway. Here we verified the significance of such genomic convergence and assessed its functional relevance to human speech/language using human genetic variation data. In normal human populations, we found the affected amino acid sites were well fixed and accompanied with significantly more associated protein-coding SNPs in the same genes than the rest genes. Diseased individuals with speech/language disorders have significant more low frequency protein coding SNPs but they preferentially occurred outside the affected genes. Such patients' SNPs were enriched in several functional categories including two axon guidance pathways (mediated by netrin and semaphorin) that interact with ROBO-SLITs. Four of the six patients have homozygous missense SNPs on PRAME gene family, one youngest gene family in human lineage, which possibly acts upon retinoic acid receptor signaling, similarly as FOXP2, to modulate axon guidance. Taken together, we suggest the axon guidance pathways (e.g. ROBO-SLIT, PRAME gene family) served as common targets for human speech/language evolution and related disorders. Copyright © 2017 Elsevier Inc. All rights reserved.
A recursive linear predictive vocoder
NASA Astrophysics Data System (ADS)
Janssen, W. A.
1983-12-01
A non-real time 10 pole recursive autocorrelation linear predictive coding vocoder was created for use in studying effects of recursive autocorrelation on speech. The vocoder is composed of two interchangeable pitch detectors, a speech analyzer, and speech synthesizer. The time between updating filter coefficients is allowed to vary from .125 msec to 20 msec. The best quality was found using .125 msec between each update. The greatest change in quality was noted when changing from 20 msec/update to 10 msec/update. Pitch period plots for the center clipping autocorrelation pitch detector and simplified inverse filtering technique are provided. Plots of speech into and out of the vocoder are given. Formant versus time three dimensional plots are shown. Effects of noise on pitch detection and formants are shown. Noise effects the voiced/unvoiced decision process causing voiced speech to be re-constructed as unvoiced.
Degraded neural and behavioral processing of speech sounds in a rat model of Rett syndrome
Engineer, Crystal T.; Rahebi, Kimiya C.; Borland, Michael S.; Buell, Elizabeth P.; Centanni, Tracy M.; Fink, Melyssa K.; Im, Kwok W.; Wilson, Linda G.; Kilgard, Michael P.
2015-01-01
Individuals with Rett syndrome have greatly impaired speech and language abilities. Auditory brainstem responses to sounds are normal, but cortical responses are highly abnormal. In this study, we used the novel rat Mecp2 knockout model of Rett syndrome to document the neural and behavioral processing of speech sounds. We hypothesized that both speech discrimination ability and the neural response to speech sounds would be impaired in Mecp2 rats. We expected that extensive speech training would improve speech discrimination ability and the cortical response to speech sounds. Our results reveal that speech responses across all four auditory cortex fields of Mecp2 rats were hyperexcitable, responded slower, and were less able to follow rapidly presented sounds. While Mecp2 rats could accurately perform consonant and vowel discrimination tasks in quiet, they were significantly impaired at speech sound discrimination in background noise. Extensive speech training improved discrimination ability. Training shifted cortical responses in both Mecp2 and control rats to favor the onset of speech sounds. While training increased the response to low frequency sounds in control rats, the opposite occurred in Mecp2 rats. Although neural coding and plasticity are abnormal in the rat model of Rett syndrome, extensive therapy appears to be effective. These findings may help to explain some aspects of communication deficits in Rett syndrome and suggest that extensive rehabilitation therapy might prove beneficial. PMID:26321676
An articulatorily constrained, maximum entropy approach to speech recognition and speech coding
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.
Hidden Markov models (HMM`s) are among the most popular tools for performing computer speech recognition. One of the primary reasons that HMM`s typically outperform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. This makes HMM`s better able to deal with intra- and inter-speaker variability despite the limited knowledge of how speech signals vary and despite the often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values aremore » constrained using the limited knowledge of speech, recognition performance decreases. However, the structure of an HMM has little in common with the mechanisms underlying speech production. Here, the author argues that by using probabilistic models that more accurately embody the process of speech production, he can create models that have all the advantages of HMM`s, but that should more accurately capture the statistical properties of real speech samples--presumably leading to more accurate speech recognition. The model he will discuss uses the fact that speech articulators move smoothly and continuously. Before discussing how to use articulatory constraints, he will give a brief description of HMM`s. This will allow him to highlight the similarities and differences between HMM`s and the proposed technique.« less
Can bilingual two-year-olds code-switch?
Lanza, E
1992-10-01
Sociolinguists have investigated language mixing as code-switching in the speech of bilingual children three years old and older. Language mixing by bilingual two-year-olds, however, has generally been interpreted in the child language literature as a sign of the child's lack of language differentiation. The present study applies perspectives from sociolinguistics to investigate the language mixing of a bilingual two-year-old acquiring Norwegian and English simultaneously in Norway. Monthly recordings of the child's spontaneous speech in interactions with her parents were made from the age of 2;0 to 2;7. An investigation into the formal aspects of the child's mixing and the context of the mixing reveals that she does differentiate her language use in contextually sensitive ways, hence that she can code-switch. This investigation stresses the need to examine more carefully the roles of dominance and context in the language mixing of young bilingual children.
Revisiting place and temporal theories of pitch
2014-01-01
The nature of pitch and its neural coding have been studied for over a century. A popular debate has revolved around the question of whether pitch is coded via “place” cues in the cochlea, or via timing cues in the auditory nerve. In the most recent incarnation of this debate, the role of temporal fine structure has been emphasized in conveying important pitch and speech information, particularly because the lack of temporal fine structure coding in cochlear implants might explain some of the difficulties faced by cochlear implant users in perceiving music and pitch contours in speech. In addition, some studies have postulated that hearing-impaired listeners may have a specific deficit related to processing temporal fine structure. This article reviews some of the recent literature surrounding the debate, and argues that much of the recent evidence suggesting the importance of temporal fine structure processing can also be accounted for using spectral (place) or temporal-envelope cues. PMID:25364292
A 4.8 kbps code-excited linear predictive coder
NASA Technical Reports Server (NTRS)
Tremain, Thomas E.; Campbell, Joseph P., Jr.; Welch, Vanoy C.
1988-01-01
A secure voice system STU-3 capable of providing end-to-end secure voice communications (1984) was developed. The terminal for the new system will be built around the standard LPC-10 voice processor algorithm. The performance of the present STU-3 processor is considered to be good, its response to nonspeech sounds such as whistles, coughs and impulse-like noises may not be completely acceptable. Speech in noisy environments also causes problems with the LPC-10 voice algorithm. In addition, there is always a demand for something better. It is hoped that LPC-10's 2.4 kbps voice performance will be complemented with a very high quality speech coder operating at a higher data rate. This new coder is one of a number of candidate algorithms being considered for an upgraded version of the STU-3 in late 1989. The problems of designing a code-excited linear predictive (CELP) coder to provide very high quality speech at a 4.8 kbps data rate that can be implemented on today's hardware are considered.
Digital Coding and the Self-Proving Message
ERIC Educational Resources Information Center
Dettering, Richard
1971-01-01
Author suggests that digital Communication", which relies on arbitrary coding elements, like the phones of speech," overshadows the importance of the analogic symbolism people use more extensively than realized. Non-verbal messages can be more convincing than verbal and can be used to predict patterns of future behavior. (Author/PD)
ERIC Educational Resources Information Center
Lechler, Suzanne; Hare, Dougal Julian
2015-01-01
A naturalistic observational single case study was carried out to investigate the form and function of private speech (PS) in a young man with Dandy-Walker variant syndrome and trisomy 22. Video recordings were observed, transcribed and coded to identify all combinations of type and form of PS. Through comparison between theories of PS and the…
1988-05-01
Seeciv Limited- System for varying Senses term filter capacity output until some Figure 2. Original limited-capacity channel model (Frim Broadbent, 1958) S...2 Figure 2. Original limited-capacity channel model (From Broadbent, 1958) .... 10 Figure 3. Experimental...unlimited variety of human voices for digital recording sources. Synthesis by Analysis Analysis-synthesis methods electronically model the human voice
Sensory Information Processing
1975-12-31
system noise . To see how this is avoided, note that zeroes in the blur spectrum become sharp, spike-like negative «*»• Page impulses when the...Synthetic Speech Quality Using Binaural Reverberation-- Boll 12 13 Section 4. Noise Suppression with Linear Prediction Filtering—Peterson 24 Section...5. Speech Processing to Reduce Noise and Improve Intelligibility— Callahan 28 Section 6. Linear Predictive Coding with a Glottal 36 Section 7
Multiparticipant Chat Analysis: A Survey
2013-02-26
language variation (e.g., regional speech in Germany [6]; code-switching in German-speaking regions of Switzerland [84] and Indian IRC channels [77]), and...messages which may be missed in high- tempo situations [19], and automated analysis of chat messages [13]. Finally, the high number of chat messages can...Androutsopoulos, E. Ziegler, Exploring language variation on the internet: Regional speech in a chat community, in: Proceedings of the Second International
Cross-language Activation and the Phonetics of Code-switching
NASA Astrophysics Data System (ADS)
Piccinini, Page Elizabeth
It is now well established that bilinguals have both languages activated to some degree at all times. This cross-language activation has been documented in several research paradigms, including picture naming, reading, and electrophysiological studies. What is less well understood is how the degree a language is activated can vary in different language environments or contexts. Furthermore, when investigating effects of order of acquisition and language dominance, past research has been mixed, as the two variables are often conflated. In this dissertation, I test how degree of cross-language activation can vary according to context by examining phonetic productions in code-switching speech. Both spontaneous speech and scripted speech are analyzed. Follow-up perception experiments are conducted to see if listeners are able to anticipate language switches, potentially due to the phonetic cues in the signal. Additionally, by focusing on early bilinguals who are L1 Spanish but English dominant, I am able to see what plays a greater role in cross-language activation, order of acquisition or language dominance. I find that speakers do have intermediate phonetic productions in code-switching contexts relative to monolingual contexts. Effects are larger and more consistent in English than Spanish. Similar effects are found in speech perception. Listeners are able to anticipate language switches from English to Spanish but not Spanish to English. Together these results suggest that language dominance is a more important factor than order of acquisition in cross-language activation for early bilinguals. Future models on bilingual language organization and access should take into account both context and language dominance when modeling degrees of cross-language activation.
Results using the OPAL strategy in Mandarin speaking cochlear implant recipients.
Vandali, Andrew E; Dawson, Pam W; Arora, Komal
2017-01-01
To evaluate the effectiveness of an experimental pitch-coding strategy for improving recognition of Mandarin lexical tone in cochlear implant (CI) recipients. Adult CI recipients were tested on recognition of Mandarin tones in quiet and speech-shaped noise at a signal-to-noise ratio of +10 dB; Mandarin sentence speech-reception threshold (SRT) in speech-shaped noise; and pitch discrimination of synthetic complex-harmonic tones in quiet. Two versions of the experimental strategy were examined: (OPAL) linear (1:1) mapping of fundamental frequency (F0) to the coded modulation rate; and (OPAL+) transposed mapping of high F0s to a lower coded rate. Outcomes were compared to results using the clinical ACE™ strategy. Five Mandarin speaking users of Nucleus® cochlear implants. A small but significant benefit in recognition of lexical tones was observed using OPAL compared to ACE in noise, but not in quiet, and not for OPAL+ compared to ACE or OPAL in quiet or noise. Sentence SRTs were significantly better using OPAL+ and comparable using OPAL to those using ACE. No differences in pitch discrimination thresholds were observed across strategies. OPAL can provide benefits to Mandarin lexical tone recognition in moderately noisy conditions and preserve perception of Mandarin sentences in challenging noise conditions.
Psychoacoustic cues to emotion in speech prosody and music.
Coutinho, Eduardo; Dibben, Nicola
2013-01-01
There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.
ERIC Educational Resources Information Center
Kolehmainen, Leena; Skaffari, Janne
2016-01-01
This article serves as an introduction to a collection of four articles on multilingual practices in speech and writing, exploring both contemporary and historical sources. It not only introduces the articles but also discusses the scope and definitions of code-switching, attitudes towards multilingual interaction and, most pertinently, the…
ERIC Educational Resources Information Center
Bauminger-Zviely, Nirit; Golan-Itshaky, Adi; Tubul-Lavy, Gila
2017-01-01
In this study, we videotaped two 10-min. free-play interactions and coded speech acts (SAs) in peer talk of 51 preschoolers (21 ASD, 30 typical), interacting with friend versus non-friend partners. Groups were matched for maternal education, IQ (verbal/nonverbal), and CA. We compared SAs by group (ASD/typical), by partner's friendship status…
Perception and Neural Coding of Harmonic Fusion in Ferrets
2004-01-01
distinct percepts that come under the rubric of pitch, be- cause periodicity pitch underlies speakers’ voices and speech prosody, as well as musical ...spectral fusion is unclear for sounds having predominantly low-frequency spectra such as speech, music , and many animal vocalizations. In summary...84, 560–565. von Helmholtz, H. (1863). Die Lehre von den Tonempfindungen als physiologische Grundlage fr die Theorie der Musik . (Vieweg und Sohn
The Matrix Pencil and its Applications to Speech Processing
2007-03-01
Elementary Linear Algebra ” 8th edition, pp. 278, 2000 John Wiley & Sons, Inc., New York [37] Wai C. Chu, “Speech Coding Algorithms”, New Jeresy: John...Ben; Daniel, James W.; “Applied Linear Algebra ”, pp. 342-345, 1988 Prentice Hall, Englewood Cliffs, NJ [35] Haykin, Simon “Applied Linear Adaptive...ABSTRACT Matrix Pencils facilitate the study of differential equations resulting from oscillating systems. Certain problems in linear ordinary
Simplified APC for Space Shuttle applications. [Adaptive Predictive Coding for speech transmission
NASA Technical Reports Server (NTRS)
Hutchins, S. E.; Batson, B. H.
1975-01-01
This paper describes an 8 kbps adaptive predictive digital speech transmission system which was designed for potential use in the Space Shuttle Program. The system was designed to provide good voice quality in the presence of both cabin noise on board the Shuttle and the anticipated bursty channel. Minimal increase in size, weight, and power over the current high data rate system was also a design objective.
Iles, Jane; Spiby, Helen; Slade, Pauline
2014-10-01
Little is known about what constitutes key components of partner support during the childbirth experience. This study modified the five minute speech sample, a measure of expressed emotion (EE), for use with new parents in the immediate postpartum. A coding framework was developed to rate the speech samples on dimensions of couple support. Associations were explored between these codes and subsequent symptoms of postnatal depression and posttraumatic stress. 372 couples were recruited in the early postpartum and individually provided short speech samples. Posttraumatic stress and postnatal depression symptoms were assessed via questionnaire measures at six and thirteen weeks. Two hundred and twelve couples completed all time-points. Key elements of supportive interactions were identified and reliably categorised. Mothers' posttraumatic stress was associated with criticisms of the partner during childbirth, general relationship criticisms and men's perception of helplessness. Postnatal depression was associated with absence of partner empathy and any positive comments regarding the partner's support. The content of new parents' descriptions of labour and childbirth, their partner during labour and birth and their relationship within the immediate postpartum may have significant implications for later psychological functioning. Interventions to enhance specific supportive elements between couples during the antenatal period merit development and evaluation.
Arsenault, Jessica S; Buchsbaum, Bradley R
2016-08-01
The motor theory of speech perception has experienced a recent revival due to a number of studies implicating the motor system during speech perception. In a key study, Pulvermüller et al. (2006) showed that premotor/motor cortex differentially responds to the passive auditory perception of lip and tongue speech sounds. However, no study has yet attempted to replicate this important finding from nearly a decade ago. The objective of the current study was to replicate the principal finding of Pulvermüller et al. (2006) and generalize it to a larger set of speech tokens while applying a more powerful statistical approach using multivariate pattern analysis (MVPA). Participants performed an articulatory localizer as well as a speech perception task where they passively listened to a set of eight syllables while undergoing fMRI. Both univariate and multivariate analyses failed to find evidence for somatotopic coding in motor or premotor cortex during speech perception. Positive evidence for the null hypothesis was further confirmed by Bayesian analyses. Results consistently show that while the lip and tongue areas of the motor cortex are sensitive to movements of the articulators, they do not appear to preferentially respond to labial and alveolar speech sounds during passive speech perception.
Carroll, Jeff; Zeng, Fan-Gang
2007-01-01
Increasing the number of channels at low frequencies improves discrimination of fundamental frequency (F0) in cochlear implants [Geurts and Wouters 2004]. We conducted three experiments to test whether improved F0 discrimination can be translated into increased speech intelligibility in noise in a cochlear implant simulation. The first experiment measured F0 discrimination and speech intelligibility in quiet as a function of channel density over different frequency regions. The results from this experiment showed a tradeoff in performance between F0 discrimination and speech intelligibility with a limited number of channels. The second experiment tested whether improved F0 discrimination and optimizing this tradeoff could improve speech performance with a competing talker. However, improved F0 discrimination did not improve speech intelligibility in noise. The third experiment identified the critical number of channels needed at low frequencies to improve speech intelligibility in noise. The result showed that, while 16 channels below 500 Hz were needed to observe any improvement in speech intelligibility in noise, even 32 channels did not achieve normal performance. Theoretically, these results suggest that without accurate spectral coding, F0 discrimination and speech perception in noise are two independent processes. Practically, the present results illustrate the need to increase the number of independent channels in cochlear implants. PMID:17604581
Woodruff Carr, Kali; Fitzroy, Ahren B; Tierney, Adam; White-Schwoch, Travis; Kraus, Nina
2017-01-01
Speech communication involves integration and coordination of sensory perception and motor production, requiring precise temporal coupling. Beat synchronization, the coordination of movement with a pacing sound, can be used as an index of this sensorimotor timing. We assessed adolescents' synchronization and capacity to correct asynchronies when given online visual feedback. Variability of synchronization while receiving feedback predicted phonological memory and reading sub-skills, as well as maturation of cortical auditory processing; less variable synchronization during the presence of feedback tracked with maturation of cortical processing of sound onsets and resting gamma activity. We suggest the ability to incorporate feedback during synchronization is an index of intentional, multimodal timing-based integration in the maturing adolescent brain. Precision of temporal coding across modalities is important for speech processing and literacy skills that rely on dynamic interactions with sound. Synchronization employing feedback may prove useful as a remedial strategy for individuals who struggle with timing-based language learning impairments. Copyright © 2016 Elsevier Inc. All rights reserved.
Using the structure of natural scenes and sounds to predict neural response properties in the brain
NASA Astrophysics Data System (ADS)
Deweese, Michael
2014-03-01
The natural scenes and sounds we encounter in the world are highly structured. The fact that animals and humans are so efficient at processing these sensory signals compared with the latest algorithms running on the fastest modern computers suggests that our brains can exploit this structure. We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogra representation of sound such as localized checkerboard patterns and frequency-modulated excitatory subregions flanked by suppressive sidebands. Moreover, several of these novel features resemble neuronal receptive fields reported in the Inferior Colliculus (IC), as well as auditory thalamus (MGBv) and primary auditory cortex (A1), and our model neurons exhibit the same tradeoff in spectrotemporal resolution as has been observed in IC. To our knowledge, this is the first demonstration that receptive fields of neurons in the ascending mammalian auditory pathway beyond the auditory nerve can be predicted based on coding principles and the statistical properties of recorded sounds. We have also developed a biologically-inspired neural network model of primary visual cortex (V1) that can learn a sparse representation of natural scenes using spiking neurons and strictly local plasticity rules. The representation learned by our model is in good agreement with measured receptive fields in V1, demonstrating that sparse sensory coding can be achieved in a realistic biological setting.
Performance of concatenated Reed-Solomon trellis-coded modulation over Rician fading channels
NASA Technical Reports Server (NTRS)
Moher, Michael L.; Lodge, John H.
1990-01-01
A concatenated coding scheme for providing very reliable data over mobile-satellite channels at power levels similar to those used for vocoded speech is described. The outer code is a shorter Reed-Solomon code which provides error detection as well as error correction capabilities. The inner code is a 1-D 8-state trellis code applied independently to both the inphase and quadrature channels. To achieve the full error correction potential of this inner code, the code symbols are multiplexed with a pilot sequence which is used to provide dynamic channel estimation and coherent detection. The implementation structure of this scheme is discussed and its performance is estimated.
Bailey, Dallin J; Blomgren, Michael; DeLong, Catharine; Berggren, Kiera; Wambaugh, Julie L
2017-06-22
The purpose of this article is to quantify and describe stuttering-like disfluencies in speakers with acquired apraxia of speech (AOS), utilizing the Lidcombe Behavioural Data Language (LBDL). Additional purposes include measuring test-retest reliability and examining the effect of speech sample type on disfluency rates. Two types of speech samples were elicited from 20 persons with AOS and aphasia: repetition of mono- and multisyllabic words from a protocol for assessing AOS (Duffy, 2013), and connected speech tasks (Nicholas & Brookshire, 1993). Sampling was repeated at 1 and 4 weeks following initial sampling. Stuttering-like disfluencies were coded using the LBDL, which is a taxonomy that focuses on motoric aspects of stuttering. Disfluency rates ranged from 0% to 13.1% for the connected speech task and from 0% to 17% for the word repetition task. There was no significant effect of speech sampling time on disfluency rate in the connected speech task, but there was a significant effect of time for the word repetition task. There was no significant effect of speech sample type. Speakers demonstrated both major types of stuttering-like disfluencies as categorized by the LBDL (fixed postures and repeated movements). Connected speech samples yielded more reliable tallies over repeated measurements. Suggestions are made for modifying the LBDL for use in AOS in order to further add to systematic descriptions of motoric disfluencies in this disorder.
Hengst, Julie A; Frame, Simone R; Neuman-Stritzel, Tiffany; Gannaway, Rachel
2005-02-01
Reported speech, wherein one quotes or paraphrases the speech of another, has been studied extensively as a set of linguistic and discourse practices. Researchers agree that reported speech is pervasive, found across languages, and used in diverse contexts. However, to date, there have been no studies of the use of reported speech among individuals with aphasia. Grounded in an interactional sociolinguistic perspective, the study presented here documents and analyzes the use of reported speech by 7 adults with mild to moderately severe aphasia and their routine communication partners. Each of the 7 pairs was videotaped in 4 everyday activities at home or around the community, yielding over 27 hr of conversational interaction for analysis. A coding scheme was developed that identified 5 types of explicitly marked reported speech: direct, indirect, projected, indexed, and undecided. Analysis of the data documented reported speech as a common discourse practice used successfully by the individuals with aphasia and their communication partners. All participants produced reported speech at least once, and across all observations the target pairs produced 400 reported speech episodes (RSEs), 149 by individuals with aphasia and 251 by their communication partners. For all participants, direct and indirect forms were the most prevalent (70% of RSEs). Situated discourse analysis of specific episodes of reported speech used by 3 of the pairs provides detailed portraits of the diverse interactional, referential, social, and discourse functions of reported speech and explores ways that the pairs used reported speech to successfully frame talk despite their ongoing management of aphasia.
Tona, Risa; Naito, Yasushi; Moroto, Saburo; Yamamoto, Rinko; Fujiwara, Keizo; Yamazaki, Hiroshi; Shinohara, Shogo; Kikuchi, Masahiro
2015-12-01
To investigate the McGurk effect in profoundly deafened Japanese children with cochlear implants (CI) and in normal-hearing children. This was done to identify how children with profound deafness using CI established audiovisual integration during the speech acquisition period. Twenty-four prelingually deafened children with CI and 12 age-matched normal-hearing children participated in this study. Responses to audiovisual stimuli were compared between deafened and normal-hearing controls. Additionally, responses of the children with CI younger than 6 years of age were compared with those of the children with CI at least 6 years of age at the time of the test. Responses to stimuli combining auditory labials and visual non-labials were significantly different between deafened children with CI and normal-hearing controls (p<0.05). Additionally, the McGurk effect tended to be more induced in deafened children older than 6 years of age than in their younger counterparts. The McGurk effect was more significantly induced in prelingually deafened Japanese children with CI than in normal-hearing, age-matched Japanese children. Despite having good speech-perception skills and auditory input through their CI, from early childhood, deafened children may use more visual information in speech perception than normal-hearing children. As children using CI need to communicate based on insufficient speech signals coded by CI, additional activities of higher-order brain function may be necessary to compensate for the incomplete auditory input. This study provided information on the influence of deafness on the development of audiovisual integration related to speech, which could contribute to our further understanding of the strategies used in spoken language communication by prelingually deafened children. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
An Adaptive Approach to a 2.4 kb/s LPC Speech Coding System.
1985-07-01
laryngeal cancer ). Spectral estimation is at the foundation of speech analysis for all these goals and accurate AR model estimation in noise is...S ,5 mWnL NrinKt ) o ,-G p (d va Rmea.imn flU: 5() WOM Lu M(G)INUNM 40 4KeemS! MU= 1 UD M5) SIGHSM A SO= WAGe . M. (d) I U NS maIm ( IW vis MAMA
Geo-Coding for the Mapping of Documents and Social Media Messages
2013-08-22
O.L. (2007). UBC-ALM: Combining KNN with SVD for WSD. Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007), Prague...and Yarowsky, D. (1992). One sense per discourse. In Proceedings of the 4th DARPA Speech and Natural Language Workshop. pp. 233-237, 1992. Retrieved...Part-of- Speech Tagging for Twitter: Annotation, Features, and Experiments. Proceedings of the Annual Meeting of the Association for Computational
Structured codebook design in CELP
NASA Technical Reports Server (NTRS)
Leblanc, W. P.; Mahmoud, S. A.
1990-01-01
Codebook Excited Linear Protection (CELP) is a popular analysis by synthesis technique for quantizing speech at bit rates from 4 to 6 kbps. Codebook design techniques to date have been largely based on either random (often Gaussian) codebooks, or on known binary or ternary codes which efficiently map the space of (assumed white) excitation codevectors. It has been shown that by introducing symmetries into the codebook, good complexity reduction can be realized with only marginal decrease in performance. Codebook design algorithms are considered for a wide range of structured codebooks.
DEBLICOM: Deaf-Blind Communication & Control Systems: First Quarterly Progress Report.
ERIC Educational Resources Information Center
Kafafian, Haig
Reported on is the first phase of development of DEBLICOM, a code for a two-way communication system for deaf-blind individuals who may be speech-impaired. Brief sections cover the following topics: alternatives to and considerations for the development of cutaneous codes for deaf-blind people; the DEBLICOM system which provides a means of…
Jiang, Chenghui; Whitehill, Tara L
2014-04-01
Speech errors associated with cleft palate are well established for English and several other Indo-European languages. Few articles describing the speech of Putonghua (standard Mandarin Chinese) speakers with cleft palate have been published in English language journals. Although methodological guidelines have been published for the perceptual speech evaluation of individuals with cleft palate, there has been no critical review of methodological issues in studies of Putonghua speakers with cleft palate. A literature search was conducted to identify relevant studies published over the past 30 years in Chinese language journals. Only studies incorporating perceptual analysis of speech were included. Thirty-seven articles which met inclusion criteria were analyzed and coded on a number of methodological variables. Reliability was established by having all variables recoded for all studies. This critical review identified many methodological issues. These design flaws make it difficult to draw reliable conclusions about characteristic speech errors in this group of speakers. Specific recommendations are made to improve the reliability and validity of future studies, as well to facilitate cross-center comparisons.
Phonology and Vocal Behavior in Toddlers with Autism Spectrum Disorders
Schoen, Elizabeth; Paul, Rhea; Chawarska, Katyrzyna
2011-01-01
Scientific Abstract The purpose of this study is to examine the phonological and other vocal productions of children, 18-36 months, with autism spectrum disorder (ASD) and to compare these productions to those of age-matched and language-matched controls. Speech samples were obtained from 30 toddlers with ASD, 11 age-matched toddlers and 23 language-matched toddlers during either parent-child or clinician-child play sessions. Samples were coded for a variety of speech-like and non-speech vocalization productions. Toddlers with ASD produced speech-like vocalizations similar to those of language-matched peers, but produced significantly more atypical non-speech vocalizations when compared to both control groups.Toddlers with ASD show speech-like sound production that is linked to their language level, in a manner similar to that seen in typical development. The main area of difference in vocal development in this population is in the production of atypical vocalizations. Findings suggest that toddlers with autism spectrum disorders might not tune into the language model of their environment. Failure to attend to the ambient language environment negatively impacts the ability to acquire spoken language. PMID:21308998
Gesture and speech during shared book reading with preschoolers with specific language impairment.
Lavelli, Manuela; Barachetti, Chiara; Florit, Elena
2015-11-01
This study examined (a) the relationship between gesture and speech produced by children with specific language impairment (SLI) and typically developing (TD) children, and their mothers, during shared book-reading, and (b) the potential effectiveness of gestures accompanying maternal speech on the conversational responsiveness of children. Fifteen preschoolers with expressive SLI were compared with fifteen age-matched and fifteen language-matched TD children. Child and maternal utterances were coded for modality, gesture type, gesture-speech informational relationship, and communicative function. Relative to TD peers, children with SLI used more bimodal utterances and gestures adding unique information to co-occurring speech. Some differences were mirrored in maternal communication. Sequential analysis revealed that only in the SLI group maternal reading accompanied by gestures was significantly followed by child's initiatives, and when maternal non-informative repairs were accompanied by gestures, they were more likely to elicit adequate answers from children. These findings support the 'gesture advantage' hypothesis in children with SLI, and have implications for educational and clinical practice.
The minor third communicates sadness in speech, mirroring its use in music.
Curtis, Meagan E; Bharucha, Jamshed J
2010-06-01
There is a long history of attempts to explain why music is perceived as expressing emotion. The relationship between pitches serves as an important cue for conveying emotion in music. The musical interval referred to as the minor third is generally thought to convey sadness. We reveal that the minor third also occurs in the pitch contour of speech conveying sadness. Bisyllabic speech samples conveying four emotions were recorded by 9 actresses. Acoustic analyses revealed that the relationship between the 2 salient pitches of the sad speech samples tended to approximate a minor third. Participants rated the speech samples for perceived emotion, and the use of numerous acoustic parameters as cues for emotional identification was modeled using regression analysis. The minor third was the most reliable cue for identifying sadness. Additional participants rated musical intervals for emotion, and their ratings verified the historical association between the musical minor third and sadness. These findings support the theory that human vocal expressions and music share an acoustic code for communicating sadness.
NASA Technical Reports Server (NTRS)
Gray, Robert M.
1989-01-01
During the past ten years Vector Quantization (VQ) has developed from a theoretical possibility promised by Shannon's source coding theorems into a powerful and competitive technique for speech and image coding and compression at medium to low bit rates. In this survey, the basic ideas behind the design of vector quantizers are sketched and some comments made on the state-of-the-art and current research efforts.
Persistent Use of Mixed Code: An Exploration of Its Functions in Hong Kong Schools
ERIC Educational Resources Information Center
Low, Winnie W. M.; Lu, Dan
2006-01-01
Codemixing of Cantonese Chinese and English is a common speech behaviour used by bilingual people in Hong Kong. Though codemixing is repeatedly criticised as a cause of the decline of students' language competence, there is little hard evidence to indicate its detrimental effects. This study examines the use of mixed code in the context of the…
Speech coding at low to medium bit rates
NASA Astrophysics Data System (ADS)
Leblanc, Wilfred Paul
1992-09-01
Improved search techniques coupled with improved codebook design methodologies are proposed to improve the performance of conventional code-excited linear predictive coders for speech. Improved methods for quantizing the short term filter are developed by employing a tree search algorithm and joint codebook design to multistage vector quantization. Joint codebook design procedures are developed to design locally optimal multistage codebooks. Weighting during centroid computation is introduced to improve the outlier performance of the multistage vector quantizer. Multistage vector quantization is shown to be both robust against input characteristics and in the presence of channel errors. Spectral distortions of about 1 dB are obtained at rates of 22-28 bits/frame. Structured codebook design procedures for excitation in code-excited linear predictive coders are compared to general codebook design procedures. Little is lost using significant structure in the excitation codebooks while greatly reducing the search complexity. Sparse multistage configurations are proposed for reducing computational complexity and memory size. Improved search procedures are applied to code-excited linear prediction which attempt joint optimization of the short term filter, the adaptive codebook, and the excitation. Improvements in signal to noise ratio of 1-2 dB are realized in practice.
Near-toll quality digital speech transmission in the mobile satellite service
NASA Technical Reports Server (NTRS)
Townes, S. A.; Divsalar, D.
1986-01-01
This paper discusses system considerations for near-toll quality digital speech transmission in a 5 kHz mobile satellite system channel. Tradeoffs are shown for power performance versus delay for a 4800 bps speech compression system in conjunction with a 16 state rate 2/3 trellis coded 8PSK modulation system. The suggested system has an additional 150 ms of delay beyond the propagation delay and requires an E(b)/N(0) of about 7 dB for a Ricean channel assumption with line-of-sight to diffuse component ratio of 10 assuming ideal synchronization. An additional loss of 2 to 3 dB is expected for synchronization in fading environment.
McBee, Morgan P; Laor, Tal; Pryor, Rebecca M; Smith, Rachel; Hardin, Judy; Ulland, Lisa; May, Sally; Zhang, Bin; Towbin, Alexander J
2018-02-01
The purpose of this study was to adapt our radiology reports to provide the documentation required for specific International Classification of Diseases, tenth rev (ICD-10) diagnosis coding. Baseline data were analyzed to identify the reports with the greatest number of unspecified ICD-10 codes assigned by computer-assisted coding software. A two-part quality improvement initiative was subsequently implemented. The first component involved improving clinical histories by utilizing technologists to obtain information directly from the patients or caregivers, which was then imported into the radiologist's report within the speech recognition software. The second component involved standardization of report terminology and creation of four different structured report templates to determine which yielded the fewest reports with an unspecified ICD-10 code assigned by an automated coding engine. In all, 12,077 reports were included in the baseline analysis. Of these, 5,151 (43%) had an unspecified ICD-10 code. The majority of deficient reports were for radiographs (n = 3,197; 62%). Inadequacies included insufficient clinical history provided and lack of detailed fracture descriptions. Therefore, the focus was standardizing terminology and testing different structured reports for radiographs obtained for fractures. At baseline, 58% of radiography reports contained a complete clinical history with improvement to >95% 8 months later. The total number of reports that contained an unspecified ICD-10 code improved from 43% at baseline to 27% at completion of this study (P < .0001). The number of radiology studies with a specific ICD-10 code can be improved through quality improvement methodology, specifically through the use of technologist-acquired clinical histories and structured reporting. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.
De Jonge-Hoekstra, Lisette; Van der Steen, Steffie; Van Geert, Paul; Cox, Ralf F A
2016-01-01
As children learn they use their speech to express words and their hands to gesture. This study investigates the interplay between real-time gestures and speech as children construct cognitive understanding during a hands-on science task. 12 children (M = 6, F = 6) from Kindergarten (n = 5) and first grade (n = 7) participated in this study. Each verbal utterance and gesture during the task were coded, on a complexity scale derived from dynamic skill theory. To explore the interplay between speech and gestures, we applied a cross recurrence quantification analysis (CRQA) to the two coupled time series of the skill levels of verbalizations and gestures. The analysis focused on (1) the temporal relation between gestures and speech, (2) the relative strength and direction of the interaction between gestures and speech, (3) the relative strength and direction between gestures and speech for different levels of understanding, and (4) relations between CRQA measures and other child characteristics. The results show that older and younger children differ in the (temporal) asymmetry in the gestures-speech interaction. For younger children, the balance leans more toward gestures leading speech in time, while the balance leans more toward speech leading gestures for older children. Secondly, at the group level, speech attracts gestures in a more dynamically stable fashion than vice versa, and this asymmetry in gestures and speech extends to lower and higher understanding levels. Yet, for older children, the mutual coupling between gestures and speech is more dynamically stable regarding the higher understanding levels. Gestures and speech are more synchronized in time as children are older. A higher score on schools' language tests is related to speech attracting gestures more rigidly and more asymmetry between gestures and speech, only for the less difficult understanding levels. A higher score on math or past science tasks is related to less asymmetry between gestures and speech. The picture that emerges from our analyses suggests that the relation between gestures, speech and cognition is more complex than previously thought. We suggest that temporal differences and asymmetry in influence between gestures and speech arise from simultaneous coordination of synergies.
29 CFR 1401.21 - Information policy.
Code of Federal Regulations, 2011 CFR
2011-07-01
... excluded by subsection 552(b) of title 5, United States Code, matters covered by the Privacy Act, or other... routine public distribution, e.g., pamphlets, speeches, and educational or training materials, will be...
Measuring Speech Comprehensibility in Students with Down Syndrome
Woynaroski, Tiffany; Camarata, Stephen
2016-01-01
Purpose There is an ongoing need to develop assessments of spontaneous speech that focus on whether the child's utterances are comprehensible to listeners. This study sought to identify the attributes of a stable ratings-based measure of speech comprehensibility, which enabled examining the criterion-related validity of an orthography-based measure of the comprehensibility of conversational speech in students with Down syndrome. Method Participants were 10 elementary school students with Down syndrome and 4 unfamiliar adult raters. Averaged across-observer Likert ratings of speech comprehensibility were called a ratings-based measure of speech comprehensibility. The proportion of utterance attempts fully glossed constituted an orthography-based measure of speech comprehensibility. Results Averaging across 4 raters on four 5-min segments produced a reliable (G = .83) ratings-based measure of speech comprehensibility. The ratings-based measure was strongly (r > .80) correlated with the orthography-based measure for both the same and different conversational samples. Conclusion Reliable and valid measures of speech comprehensibility are achievable with the resources available to many researchers and some clinicians. PMID:27299989
Schwartz, Jean-Luc; Savariaux, Christophe
2014-01-01
An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call “preparatory gestures”. However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call “comodulatory gestures” providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction. PMID:25079216
Johari, Karim; Behroozmand, Roozbeh
2017-08-01
Skilled movement is mediated by motor commands executed with extremely fine temporal precision. The question of how the brain incorporates temporal information to perform motor actions has remained unanswered. This study investigated the effect of stimulus temporal predictability on response timing of speech and hand movement. Subjects performed a randomized vowel vocalization or button press task in two counterbalanced blocks in response to temporally-predictable and unpredictable visual cues. Results indicated that speech and hand reaction time was decreased for predictable compared with unpredictable stimuli. This finding suggests that a temporal predictive code is established to capture temporal dynamics of sensory cues in order to produce faster movements in responses to predictable stimuli. In addition, results revealed a main effect of modality, indicating faster hand movement compared with speech. We suggest that this effect is accounted for by the inherent complexity of speech production compared with hand movement. Lastly, we found that movement inhibition was faster than initiation for both hand and speech, suggesting that movement initiation requires a longer processing time to coordinate activities across multiple regions in the brain. These findings provide new insights into the mechanisms of temporal information processing during initiation and inhibition of speech and hand movement. Copyright © 2017 Elsevier B.V. All rights reserved.
New Perspectives on Assessing Amplification Effects
Souza, Pamela E.; Tremblay, Kelly L.
2006-01-01
Clinicians have long been aware of the range of performance variability with hearing aids. Despite improvements in technology, there remain many instances of well-selected and appropriately fitted hearing aids whereby the user reports minimal improvement in speech understanding. This review presents a multistage framework for understanding how a hearing aid affects performance. Six stages are considered: (1) acoustic content of the signal, (2) modification of the signal by the hearing aid, (3) interaction between sound at the output of the hearing aid and the listener's ear, (4) integrity of the auditory system, (5) coding of available acoustic cues by the listener's auditory system, and (6) correct identification of the speech sound. Within this framework, this review describes methodology and research on 2 new assessment techniques: acoustic analysis of speech measured at the output of the hearing aid and auditory evoked potentials recorded while the listener wears hearing aids. Acoustic analysis topics include the relationship between conventional probe microphone tests and probe microphone measurements using speech, appropriate procedures for such tests, and assessment of signal-processing effects on speech acoustics and recognition. Auditory evoked potential topics include an overview of physiologic measures of speech processing and the effect of hearing loss and hearing aids on cortical auditory evoked potential measurements in response to speech. Finally, the clinical utility of these procedures is discussed. PMID:16959734
Musicians change their tune: how hearing loss alters the neural code.
Parbery-Clark, Alexandra; Anderson, Samira; Kraus, Nina
2013-08-01
Individuals with sensorineural hearing loss have difficulty understanding speech, especially in background noise. This deficit remains even when audibility is restored through amplification, suggesting that mechanisms beyond a reduction in peripheral sensitivity contribute to the perceptual difficulties associated with hearing loss. Given that normal-hearing musicians have enhanced auditory perceptual skills, including speech-in-noise perception, coupled with heightened subcortical responses to speech, we aimed to determine whether similar advantages could be observed in middle-aged adults with hearing loss. Results indicate that musicians with hearing loss, despite self-perceptions of average performance for understanding speech in noise, have a greater ability to hear in noise relative to nonmusicians. This is accompanied by more robust subcortical encoding of sound (e.g., stimulus-to-response correlations and response consistency) as well as more resilient neural responses to speech in the presence of background noise (e.g., neural timing). Musicians with hearing loss also demonstrate unique neural signatures of spectral encoding relative to nonmusicians: enhanced neural encoding of the speech-sound's fundamental frequency but not of its upper harmonics. This stands in contrast to previous outcomes in normal-hearing musicians, who have enhanced encoding of the harmonics but not the fundamental frequency. Taken together, our data suggest that although hearing loss modifies a musician's spectral encoding of speech, the musician advantage for perceiving speech in noise persists in a hearing-impaired population by adaptively strengthening underlying neural mechanisms for speech-in-noise perception. Copyright © 2013 Elsevier B.V. All rights reserved.
Erfanian Saeedi, Nafise; Blamey, Peter J; Burkitt, Anthony N; Grayden, David B
2016-04-01
Pitch perception is important for understanding speech prosody, music perception, recognizing tones in tonal languages, and perceiving speech in noisy environments. The two principal pitch perception theories consider the place of maximum neural excitation along the auditory nerve and the temporal pattern of the auditory neurons' action potentials (spikes) as pitch cues. This paper describes a biophysical mechanism by which fine-structure temporal information can be extracted from the spikes generated at the auditory periphery. Deriving meaningful pitch-related information from spike times requires neural structures specialized in capturing synchronous or correlated activity from amongst neural events. The emergence of such pitch-processing neural mechanisms is described through a computational model of auditory processing. Simulation results show that a correlation-based, unsupervised, spike-based form of Hebbian learning can explain the development of neural structures required for recognizing the pitch of simple and complex tones, with or without the fundamental frequency. The temporal code is robust to variations in the spectral shape of the signal and thus can explain the phenomenon of pitch constancy.
Erfanian Saeedi, Nafise; Blamey, Peter J.; Burkitt, Anthony N.; Grayden, David B.
2016-01-01
Pitch perception is important for understanding speech prosody, music perception, recognizing tones in tonal languages, and perceiving speech in noisy environments. The two principal pitch perception theories consider the place of maximum neural excitation along the auditory nerve and the temporal pattern of the auditory neurons’ action potentials (spikes) as pitch cues. This paper describes a biophysical mechanism by which fine-structure temporal information can be extracted from the spikes generated at the auditory periphery. Deriving meaningful pitch-related information from spike times requires neural structures specialized in capturing synchronous or correlated activity from amongst neural events. The emergence of such pitch-processing neural mechanisms is described through a computational model of auditory processing. Simulation results show that a correlation-based, unsupervised, spike-based form of Hebbian learning can explain the development of neural structures required for recognizing the pitch of simple and complex tones, with or without the fundamental frequency. The temporal code is robust to variations in the spectral shape of the signal and thus can explain the phenomenon of pitch constancy. PMID:27049657
Neural coding of sound envelope in reverberant environments.
Slama, Michaël C C; Delgutte, Bertrand
2015-03-11
Speech reception depends critically on temporal modulations in the amplitude envelope of the speech signal. Reverberation encountered in everyday environments can substantially attenuate these modulations. To assess the effect of reverberation on the neural coding of amplitude envelope, we recorded from single units in the inferior colliculus (IC) of unanesthetized rabbit using sinusoidally amplitude modulated (AM) broadband noise stimuli presented in simulated anechoic and reverberant environments. Although reverberation degraded both rate and temporal coding of AM in IC neurons, in most neurons, the degradation in temporal coding was smaller than the AM attenuation in the stimulus. This compensation could largely be accounted for by the compressive shape of the modulation input-output function (MIOF), which describes the nonlinear transformation of modulation depth from acoustic stimuli into neural responses. Additionally, in a subset of neurons, the temporal coding of AM was better for reverberant stimuli than for anechoic stimuli having the same modulation depth at the ear. Using hybrid anechoic stimuli that selectively possess certain properties of reverberant sounds, we show that this reverberant advantage is not caused by envelope distortion, static interaural decorrelation, or spectral coloration. Overall, our results suggest that the auditory system may possess dual mechanisms that make the coding of amplitude envelope relatively robust in reverberation: one general mechanism operating for all stimuli with small modulation depths, and another mechanism dependent on very specific properties of reverberant stimuli, possibly the periodic fluctuations in interaural correlation at the modulation frequency. Copyright © 2015 the authors 0270-6474/15/354452-17$15.00/0.
Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review
NASA Astrophysics Data System (ADS)
Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH
2017-09-01
This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.
Zhong, Ziwei; Henry, Kenneth S.; Heinz, Michael G.
2014-01-01
People with sensorineural hearing loss often have substantial difficulty understanding speech under challenging listening conditions. Behavioral studies suggest that reduced sensitivity to the temporal structure of sound may be responsible, but underlying neurophysiological pathologies are incompletely understood. Here, we investigate the effects of noise-induced hearing loss on coding of envelope (ENV) structure in the central auditory system of anesthetized chinchillas. ENV coding was evaluated noninvasively using auditory evoked potentials recorded from the scalp surface in response to sinusoidally amplitude modulated tones with carrier frequencies of 1, 2, 4, and 8 kHz and a modulation frequency of 140 Hz. Stimuli were presented in quiet and in three levels of white background noise. The latency of scalp-recorded ENV responses was consistent with generation in the auditory midbrain. Hearing loss amplified neural coding of ENV at carrier frequencies of 2 kHz and above. This result may reflect enhanced ENV coding from the periphery and/or an increase in the gain of central auditory neurons. In contrast to expectations, hearing loss was not associated with a stronger adverse effect of increasing masker intensity on ENV coding. The exaggerated neural representation of ENV information shown here at the level of the auditory midbrain helps to explain previous findings of enhanced sensitivity to amplitude modulation in people with hearing loss under some conditions. Furthermore, amplified ENV coding may potentially contribute to speech perception problems in people with cochlear hearing loss by acting as a distraction from more salient acoustic cues, particularly in fluctuating backgrounds. PMID:24315815
Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree
NASA Astrophysics Data System (ADS)
Kim, Jong Kyu; Kim, Nam Soo
In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.
NASA's mobile satellite development program
NASA Technical Reports Server (NTRS)
Rafferty, William; Dessouky, Khaled; Sue, Miles
1988-01-01
A Mobile Satellite System (MSS) will provide data and voice communications over a vast geographical area to a large population of mobile users. A technical overview is given of the extensive research and development studies and development performed under NASA's mobile satellite program (MSAT-X) in support of the introduction of a U.S. MSS. The critical technologies necessary to enable such a system are emphasized: vehicle antennas, modulation and coding, speech coders, networking and propagation characterization. Also proposed is a first, and future generation MSS architecture based upon realized ground segment equipment and advanced space segment studies.
Bendixen, Alexandra; Scharinger, Mathias; Strauß, Antje; Obleser, Jonas
2014-04-01
Speech signals are often compromised by disruptions originating from external (e.g., masking noise) or internal (e.g., inaccurate articulation) sources. Speech comprehension thus entails detecting and replacing missing information based on predictive and restorative neural mechanisms. The present study targets predictive mechanisms by investigating the influence of a speech segment's predictability on early, modality-specific electrophysiological responses to this segment's omission. Predictability was manipulated in simple physical terms in a single-word framework (Experiment 1) or in more complex semantic terms in a sentence framework (Experiment 2). In both experiments, final consonants of the German words Lachs ([laks], salmon) or Latz ([lats], bib) were occasionally omitted, resulting in the syllable La ([la], no semantic meaning), while brain responses were measured with multi-channel electroencephalography (EEG). In both experiments, the occasional presentation of the fragment La elicited a larger omission response when the final speech segment had been predictable. The omission response occurred ∼125-165 msec after the expected onset of the final segment and showed characteristics of the omission mismatch negativity (MMN), with generators in auditory cortical areas. Suggestive of a general auditory predictive mechanism at work, this main observation was robust against varying source of predictive information or attentional allocation, differing between the two experiments. Source localization further suggested the omission response enhancement by predictability to emerge from left superior temporal gyrus and left angular gyrus in both experiments, with additional experiment-specific contributions. These results are consistent with the existence of predictive coding mechanisms in the central auditory system, and suggestive of the general predictive properties of the auditory system to support spoken word recognition. Copyright © 2014 Elsevier Ltd. All rights reserved.
Rith-Najarian, Leslie R.; McLaughlin, Katie A.; Sheridan, Margaret A.; Nock, Matthew K.
2014-01-01
Extensive research among adults supports the biopsychosocial (BPS) model of challenge and threat, which describes relationships among stress appraisals, physiological stress reactivity, and performance; however, no previous studies have examined these relationships in adolescents. Perceptions of stressors as well as physiological reactivity to stress increase during adolescence, highlighting the importance of understanding the relationships among stress appraisals, physiological reactivity, and performance during this developmental period. In this study, 79 adolescent participants reported on stress appraisals before and after a Trier Social Stress Test in which they performed a speech task. Physiological stress reactivity was defined by changes in cardiac output and total peripheral resistance from a baseline rest period to the speech task, and performance on the speech was coded using an objective rating system. We observed in adolescents only two relationships found in past adult research on the BPS model variables: (1) pre-task stress appraisal predicted post-task stress appraisal and (2) performance predicted post-task stress appraisal. Physiological reactivity during the speech was unrelated to pre- and post-task stress appraisals and to performance. We conclude that the lack of association between post-task stress appraisal and physiological stress reactivity suggests that adolescents might have low self-awareness of physiological emotional arousal. Our findings further suggest that adolescent stress appraisals are based largely on their performance during stressful situations. Developmental implications of this potential lack of awareness of one’s physiological and emotional state during adolescence are discussed. PMID:24491123
Rith-Najarian, Leslie R; McLaughlin, Katie A; Sheridan, Margaret A; Nock, Matthew K
2014-03-01
Extensive research among adults supports the biopsychosocial (BPS) model of challenge and threat, which describes relationships among stress appraisals, physiological stress reactivity, and performance; however, no previous studies have examined these relationships in adolescents. Perceptions of stressors as well as physiological reactivity to stress increase during adolescence, highlighting the importance of understanding the relationships among stress appraisals, physiological reactivity, and performance during this developmental period. In this study, 79 adolescent participants reported on stress appraisals before and after a Trier Social Stress Test in which they performed a speech task. Physiological stress reactivity was defined by changes in cardiac output and total peripheral resistance from a baseline rest period to the speech task, and performance on the speech was coded using an objective rating system. We observed in adolescents only two relationships found in past adult research on the BPS model variables: (1) pre-task stress appraisal predicted post-task stress appraisal and (2) performance predicted post-task stress appraisal. Physiological reactivity during the speech was unrelated to pre- and post-task stress appraisals and to performance. We conclude that the lack of association between post-task stress appraisal and physiological stress reactivity suggests that adolescents might have low self-awareness of physiological emotional arousal. Our findings further suggest that adolescent stress appraisals are based largely on their performance during stressful situations. Developmental implications of this potential lack of awareness of one's physiological and emotional state during adolescence are discussed.
Low-dimensional recurrent neural network-based Kalman filter for speech enhancement.
Xia, Youshen; Wang, Jun
2015-07-01
This paper proposes a new recurrent neural network-based Kalman filter for speech enhancement, based on a noise-constrained least squares estimate. The parameters of speech signal modeled as autoregressive process are first estimated by using the proposed recurrent neural network and the speech signal is then recovered from Kalman filtering. The proposed recurrent neural network is globally asymptomatically stable to the noise-constrained estimate. Because the noise-constrained estimate has a robust performance against non-Gaussian noise, the proposed recurrent neural network-based speech enhancement algorithm can minimize the estimation error of Kalman filter parameters in non-Gaussian noise. Furthermore, having a low-dimensional model feature, the proposed neural network-based speech enhancement algorithm has a much faster speed than two existing recurrent neural networks-based speech enhancement algorithms. Simulation results show that the proposed recurrent neural network-based speech enhancement algorithm can produce a good performance with fast computation and noise reduction. Copyright © 2015 Elsevier Ltd. All rights reserved.
Filtering, Coding, and Compression with Malvar Wavelets
1993-12-01
speech coding techniques being investigated by the military (38). Imagery: Space imagery often requires adaptive restoration to deblur out-of-focus...and blurred image, find an estimate of the ideal image using a priori information about the blur, noise , and the ideal image" (12). The research for...recording can be described as the original signal convolved with impulses , which appear as echoes in the seismic event. The term deconvolution indicates
Reading your own lips: common-coding theory and visual speech perception.
Tye-Murray, Nancy; Spehar, Brent P; Myerson, Joel; Hale, Sandra; Sommers, Mitchell S
2013-02-01
Common-coding theory posits that (1) perceiving an action activates the same representations of motor plans that are activated by actually performing that action, and (2) because of individual differences in the ways that actions are performed, observing recordings of one's own previous behavior activates motor plans to an even greater degree than does observing someone else's behavior. We hypothesized that if observing oneself activates motor plans to a greater degree than does observing others, and if these activated plans contribute to perception, then people should be able to lipread silent video clips of their own previous utterances more accurately than they can lipread video clips of other talkers. As predicted, two groups of participants were able to lipread video clips of themselves, recorded more than two weeks earlier, significantly more accurately than video clips of others. These results suggest that visual input activates speech motor activity that links to word representations in the mental lexicon.
Do perceived context pictures automatically activate their phonological code?
Jescheniak, Jörg D; Oppermann, Frank; Hantsch, Ansgar; Wagner, Valentin; Mädebach, Andreas; Schriefers, Herbert
2009-01-01
Morsella and Miozzo (Morsella, E., & Miozzo, M. (2002). Evidence for a cascade model of lexical access in speech production. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 555-563) have reported that the to-be-ignored context pictures become phonologically activated when participants name a target picture, and took this finding as support for cascaded models of lexical retrieval in speech production. In a replication and extension of their experiment in German, we failed to obtain priming effects from context pictures phonologically related to a to-be-named target picture. By contrast, corresponding context words (i.e., the names of the respective pictures) and the same context pictures, when used in an identity condition, did reliably facilitate the naming process. This pattern calls into question the generality of the claim advanced by Morsella and Miozzo that perceptual processing of pictures in the context of a naming task automatically leads to the activation of corresponding lexical-phonological codes.
Small intragenic deletion in FOXP2 associated with childhood apraxia of speech and dysarthria.
Turner, Samantha J; Hildebrand, Michael S; Block, Susan; Damiano, John; Fahey, Michael; Reilly, Sheena; Bahlo, Melanie; Scheffer, Ingrid E; Morgan, Angela T
2013-09-01
Relatively little is known about the neurobiological basis of speech disorders although genetic determinants are increasingly recognized. The first gene for primary speech disorder was FOXP2, identified in a large, informative family with verbal and oral dyspraxia. Subsequently, many de novo and familial cases with a severe speech disorder associated with FOXP2 mutations have been reported. These mutations include sequencing alterations, translocations, uniparental disomy, and genomic copy number variants. We studied eight probands with speech disorder and their families. Family members were phenotyped using a comprehensive assessment of speech, oral motor function, language, literacy skills, and cognition. Coding regions of FOXP2 were screened to identify novel variants. Segregation of the variant was determined in the probands' families. Variants were identified in two probands. One child with severe motor speech disorder had a small de novo intragenic FOXP2 deletion. His phenotype included features of childhood apraxia of speech and dysarthria, oral motor dyspraxia, receptive and expressive language disorder, and literacy difficulties. The other variant was found in a family in two of three family members with stuttering, and also in the mother with oral motor impairment. This variant was considered a benign polymorphism as it was predicted to be non-pathogenic with in silico tools and found in database controls. This is the first report of a small intragenic deletion of FOXP2 that is likely to be the cause of severe motor speech disorder associated with language and literacy problems. Copyright © 2013 Wiley Periodicals, Inc.
Action planning and predictive coding when speaking
Wang, Jun; Mathalon, Daniel H.; Roach, Brian J.; Reilly, James; Keedy, Sarah; Sweeney, John A.; Ford, Judith M.
2014-01-01
Across the animal kingdom, sensations resulting from an animal's own actions are processed differently from sensations resulting from external sources, with self-generated sensations being suppressed. A forward model has been proposed to explain this process across sensorimotor domains. During vocalization, reduced processing of one's own speech is believed to result from a comparison of speech sounds to corollary discharges of intended speech production generated from efference copies of commands to speak. Until now, anatomical and functional evidence validating this model in humans has been indirect. Using EEG with anatomical MRI to facilitate source localization, we demonstrate that inferior frontal gyrus activity during the 300ms before speaking was associated with suppressed processing of speech sounds in auditory cortex around 100ms after speech onset (N1). These findings indicate that an efference copy from speech areas in prefrontal cortex is transmitted to auditory cortex, where it is used to suppress processing of anticipated speech sounds. About 100ms after N1, a subsequent auditory cortical component (P2) was not suppressed during talking. The combined N1 and P2 effects suggest that although sensory processing is suppressed as reflected in N1, perceptual gaps are filled as reflected in the lack of P2 suppression, explaining the discrepancy between sensory suppression and preserved sensory experiences. These findings, coupled with the coherence between relevant brain regions before and during speech, provide new mechanistic understanding of the complex interactions between action planning and sensory processing that provide for differentiated tagging and monitoring of one's own speech, processes disrupted in neuropsychiatric disorders. PMID:24423729
NASA Astrophysics Data System (ADS)
Kayasith, Prakasith; Theeramunkong, Thanaruk
It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Rumbach, Anna F; Rose, Tanya A; Cheah, Mynn
2018-01-29
To explore Australian speech-language pathologists' use of non-speech oral motor exercises, and rationales for using/not using non-speech oral motor exercises in clinical practice. A total of 124 speech-language pathologists practising in Australia, working with paediatric and/or adult clients with speech sound difficulties, completed an online survey. The majority of speech-language pathologists reported that they did not use non-speech oral motor exercises when working with paediatric or adult clients with speech sound difficulties. However, more than half of the speech-language pathologists working with adult clients who have dysarthria reported using non-speech oral motor exercises with this population. The most frequently reported rationale for using non-speech oral motor exercises in speech sound difficulty management was to improve awareness/placement of articulators. The majority of speech-language pathologists agreed there is no clear clinical or research evidence base to support non-speech oral motor exercise use with clients who have speech sound difficulties. This study provides an overview of Australian speech-language pathologists' reported use and perceptions of non-speech oral motor exercises' applicability and efficacy in treating paediatric and adult clients who have speech sound difficulties. The research findings provide speech-language pathologists with insight into how and why non-speech oral motor exercises are currently used, and adds to the knowledge base regarding Australian speech-language pathology practice of non-speech oral motor exercises in the treatment of speech sound difficulties. Implications for Rehabilitation Non-speech oral motor exercises refer to oral motor activities which do not involve speech, but involve the manipulation or stimulation of oral structures including the lips, tongue, jaw, and soft palate. Non-speech oral motor exercises are intended to improve the function (e.g., movement, strength) of oral structures. The majority of speech-language pathologists agreed there is no clear clinical or research evidence base to support non-speech oral motor exercise use with clients who have speech sound disorders. Non-speech oral motor exercise use was most frequently reported in the treatment of dysarthria. Non-speech oral motor exercise use when targeting speech sound disorders is not widely endorsed in the literature.
Detection of laryngeal function using speech and electroglottographic data.
Childers, D G; Bae, K S
1992-01-01
The purpose of this research was to develop quantitative measures for the assessment of laryngeal function using speech and electroglottographic (EGG) data. We developed two procedures for the detection of laryngeal pathology: 1) a spectral distortion measure using pitch synchronous and asynchronous methods with linear predictive coding (LPC) vectors and vector quantization (VQ) and 2) analysis of the EGG signal using time interval and amplitude difference measures. The VQ procedure was conjectured to offer the possibility of circumventing the need to estimate the glottal volume velocity wave-form by inverse filtering techniques. The EGG procedure was to evaluate data that was "nearly" a direct measure of vocal fold vibratory motion and thus was conjectured to offer the potential for providing an excellent assessment of laryngeal function. A threshold based procedure gave 75.9 and 69.0% probability of pathological detection using procedures 1) and 2), respectively, for 29 patients with pathological voices and 52 normal subjects. The false alarm probability was 9.6% for the normal subjects.
[Non-speech oral motor treatment efficacy for children with developmental speech sound disorders].
Ygual-Fernandez, A; Cervera-Merida, J F
2016-01-01
In the treatment of speech disorders by means of speech therapy two antagonistic methodological approaches are applied: non-verbal ones, based on oral motor exercises (OME), and verbal ones, which are based on speech processing tasks with syllables, phonemes and words. In Spain, OME programmes are called 'programas de praxias', and are widely used and valued by speech therapists. To review the studies conducted on the effectiveness of OME-based treatments applied to children with speech disorders and the theoretical arguments that could justify, or not, their usefulness. Over the last few decades evidence has been gathered about the lack of efficacy of this approach to treat developmental speech disorders and pronunciation problems in populations without any neurological alteration of motor functioning. The American Speech-Language-Hearing Association has advised against its use taking into account the principles of evidence-based practice. The knowledge gathered to date on motor control shows that the pattern of mobility and its corresponding organisation in the brain are different in speech and other non-verbal functions linked to nutrition and breathing. Neither the studies on their effectiveness nor the arguments based on motor control studies recommend the use of OME-based programmes for the treatment of pronunciation problems in children with developmental language disorders.
Predicting couple therapy outcomes based on speech acoustic features
Nasir, Md; Baucom, Brian Robert; Narayanan, Shrikanth
2017-01-01
Automated assessment and prediction of marital outcome in couples therapy is a challenging task but promises to be a potentially useful tool for clinical psychologists. Computational approaches for inferring therapy outcomes using observable behavioral information obtained from conversations between spouses offer objective means for understanding relationship dynamics. In this work, we explore whether the acoustics of the spoken interactions of clinically distressed spouses provide information towards assessment of therapy outcomes. The therapy outcome prediction task in this work includes detecting whether there was a relationship improvement or not (posed as a binary classification) as well as discerning varying levels of improvement or decline in the relationship status (posed as a multiclass recognition task). We use each interlocutor’s acoustic speech signal characteristics such as vocal intonation and intensity, both independently and in relation to one another, as cues for predicting the therapy outcome. We also compare prediction performance with one obtained via standardized behavioral codes characterizing the relationship dynamics provided by human experts as features for automated classification. Our experiments, using data from a longitudinal clinical study of couples in distressed relations, showed that predictions of relationship outcomes obtained directly from vocal acoustics are comparable or superior to those obtained using human-rated behavioral codes as prediction features. In addition, combining direct signal-derived features with manually coded behavioral features improved the prediction performance in most cases, indicating the complementarity of relevant information captured by humans and machine algorithms. Additionally, considering the vocal properties of the interlocutors in relation to one another, rather than in isolation, showed to be important for improving the automatic prediction. This finding supports the notion that behavioral outcome, like many other behavioral aspects, is closely related to the dynamics and mutual influence of the interlocutors during their interaction and their resulting behavioral patterns. PMID:28934302
Arndt, Susan; Aschendorff, Antje; Laszig, Roland; Wesarg, Thomas
2016-01-01
The ability to detect a target signal masked by noise is improved in normal-hearing listeners when interaural phase differences (IPDs) between the ear signals exist either in the masker or in the signal. To improve binaural hearing in bilaterally implanted cochlear implant (BiCI) users, a coding strategy providing the best possible access to IPD is highly desirable. In this study, we compared two coding strategies in BiCI users provided with CI systems from MED-EL (Innsbruck, Austria). The CI systems were bilaterally programmed either with the fine structure processing strategy FS4 or with the constant rate strategy high definition continuous interleaved sampling (HDCIS). Familiarization periods between 6 and 12 weeks were considered. The effect of IPD was measured in two types of experiments: (a) IPD detection thresholds with tonal signals addressing mainly one apical interaural electrode pair and (b) with speech in noise in terms of binaural speech intelligibility level differences (BILD) addressing multiple electrodes bilaterally. The results in (a) showed improved IPD detection thresholds with FS4 compared with HDCIS in four out of the seven BiCI users. In contrast, 12 BiCI users in (b) showed similar BILD with FS4 (0.6 ± 1.9 dB) and HDCIS (0.5 ± 2.0 dB). However, no correlation between results in (a) and (b) both obtained with FS4 was found. In conclusion, the degree of IPD sensitivity determined on an apical interaural electrode pair was not an indicator for BILD based on bilateral multielectrode stimulation. PMID:27659487
Zirn, Stefan; Arndt, Susan; Aschendorff, Antje; Laszig, Roland; Wesarg, Thomas
2016-09-22
The ability to detect a target signal masked by noise is improved in normal-hearing listeners when interaural phase differences (IPDs) between the ear signals exist either in the masker or in the signal. To improve binaural hearing in bilaterally implanted cochlear implant (BiCI) users, a coding strategy providing the best possible access to IPD is highly desirable. In this study, we compared two coding strategies in BiCI users provided with CI systems from MED-EL (Innsbruck, Austria). The CI systems were bilaterally programmed either with the fine structure processing strategy FS4 or with the constant rate strategy high definition continuous interleaved sampling (HDCIS). Familiarization periods between 6 and 12 weeks were considered. The effect of IPD was measured in two types of experiments: (a) IPD detection thresholds with tonal signals addressing mainly one apical interaural electrode pair and (b) with speech in noise in terms of binaural speech intelligibility level differences (BILD) addressing multiple electrodes bilaterally. The results in (a) showed improved IPD detection thresholds with FS4 compared with HDCIS in four out of the seven BiCI users. In contrast, 12 BiCI users in (b) showed similar BILD with FS4 (0.6 ± 1.9 dB) and HDCIS (0.5 ± 2.0 dB). However, no correlation between results in (a) and (b) both obtained with FS4 was found. In conclusion, the degree of IPD sensitivity determined on an apical interaural electrode pair was not an indicator for BILD based on bilateral multielectrode stimulation. © The Author(s) 2016.
DOE Office of Scientific and Technical Information (OSTI.GOV)
King, S
The highlights of the many public programs are described and summaries of plenary session speeches are included. Names, addresses, and solar interest codes of conference registrants are included. Eleven technical papers or summaries are included. A separate citation was prepared for each one. (MHR)
Development of a coding form for approach control/pilot voice communications.
DOT National Transportation Integrated Search
1995-05-01
The Aviation Topics Speech Acts Taxonomy (ATSAT) is a tool for categorizing pilot/controller communications according to their purpose and for classifying communication errors. Air traffic controller communications that deviate from FAA Air Traffic C...
Mapping the Speech Code: Cortical Responses Linking the Perception and Production of Vowels
Schuerman, William L.; Meyer, Antje S.; McQueen, James M.
2017-01-01
The acoustic realization of speech is constrained by the physical mechanisms by which it is produced. Yet for speech perception, the degree to which listeners utilize experience derived from speech production has long been debated. In the present study, we examined how sensorimotor adaptation during production may affect perception, and how this relationship may be reflected in early vs. late electrophysiological responses. Participants first performed a baseline speech production task, followed by a vowel categorization task during which EEG responses were recorded. In a subsequent speech production task, half the participants received shifted auditory feedback, leading most to alter their articulations. This was followed by a second, post-training vowel categorization task. We compared changes in vowel production to both behavioral and electrophysiological changes in vowel perception. No differences in phonetic categorization were observed between groups receiving altered or unaltered feedback. However, exploratory analyses revealed correlations between vocal motor behavior and phonetic categorization. EEG analyses revealed correlations between vocal motor behavior and cortical responses in both early and late time windows. These results suggest that participants' recent production behavior influenced subsequent vowel perception. We suggest that the change in perception can be best characterized as a mapping of acoustics onto articulation. PMID:28439232
Generating and Describing Affective Eye Behaviors
NASA Astrophysics Data System (ADS)
Mao, Xia; Li, Zheng
The manner of a person's eye movement conveys much about nonverbal information and emotional intent beyond speech. This paper describes work on expressing emotion through eye behaviors in virtual agents based on the parameters selected from the AU-Coded facial expression database and real-time eye movement data (pupil size, blink rate and saccade). A rule-based approach to generate primary (joyful, sad, angry, afraid, disgusted and surprise) and intermediate emotions (emotions that can be represented as the mixture of two primary emotions) utilized the MPEG4 FAPs (facial animation parameters) is introduced. Meanwhile, based on our research, a scripting tool, named EEMML (Emotional Eye Movement Markup Language) that enables authors to describe and generate emotional eye movement of virtual agents, is proposed.
Müller, Joachim
2005-01-01
Over the past two decades, the fascinating possibilities of cochlear implants for congenitally deaf or deafened children and adults developed tremendously and created a rapidly developing interdisciplinary research field. The main advancements of cochlear implantation in the past decade are marked by significant improvement of hearing and speech understanding in CI users. These improvements are attributed to the enhancement of speech coding strategies. The Implantation of more (and increasingly younger) children as well as the possibilities of the restoration of binaural hearing abilities with cochlear implants reflect the high standards reached by this development. Despite this progress, modern cochlear implants do not yet enable normal speech understanding, not even for the best patients. In particular speech understanding in noise remains problematic [1]. Until the mid 1990ies research concentrated on unilateral implantation. Remarkable and effective improvements have been made with bilateral implantation since 1996. Nowadays an increasing numbers of patients enjoy these benefits. PMID:22073052
Müller, Joachim
2005-01-01
Over the past two decades, the fascinating possibilities of cochlear implants for congenitally deaf or deafened children and adults developed tremendously and created a rapidly developing interdisciplinary research field.The main advancements of cochlear implantation in the past decade are marked by significant improvement of hearing and speech understanding in CI users. These improvements are attributed to the enhancement of speech coding strategies.The Implantation of more (and increasingly younger) children as well as the possibilities of the restoration of binaural hearing abilities with cochlear implants reflect the high standards reached by this development. Despite this progress, modern cochlear implants do not yet enable normal speech understanding, not even for the best patients. In particular speech understanding in noise remains problematic [1]. Until the mid 1990ies research concentrated on unilateral implantation. Remarkable and effective improvements have been made with bilateral implantation since 1996. Nowadays an increasing numbers of patients enjoy these benefits.
Lee, J D; Caven, B; Haake, S; Brown, T L
2001-01-01
As computer applications for cars emerge, a speech-based interface offers an appealing alternative to the visually demanding direct manipulation interface. However, speech-based systems may pose cognitive demands that could undermine driving safety. This study used a car-following task to evaluate how a speech-based e-mail system affects drivers' response to the periodic braking of a lead vehicle. The study included 24 drivers between the ages of 18 and 24 years. A baseline condition with no e-mail system was compared with a simple and a complex e-mail system in both simple and complex driving environments. The results show a 30% (310 ms) increase in reaction time when the speech-based system is used. Subjective workload ratings and probe questions also indicate that speech-based interaction introduces a significant cognitive load, which was highest for the complex e-mail system. These data show that a speech-based interface is not a panacea that eliminates the potential distraction of in-vehicle computers. Actual or potential applications of this research include design of in-vehicle information systems and evaluation of their contributions to driver distraction.
Fine-coarse semantic processing in schizophrenia: a reversed pattern of hemispheric dominance.
Zeev-Wolf, Maor; Goldstein, Abraham; Levkovitz, Yechiel; Faust, Miriam
2014-04-01
Left lateralization for language processing is a feature of neurotypical brains. In individuals with schizophrenia, lack of left lateralization is associated with the language impairments manifested in this population. Beeman׳s fine-coarse semantic coding model asserts left hemisphere specialization in fine (i.e., conventionalized) semantic coding and right hemisphere specialization in coarse (i.e., non-conventionalized) semantic coding. Applying this model to schizophrenia would suggest that language impairments in this population are a result of greater reliance on coarse semantic coding. We investigated this hypothesis and examined whether a reversed pattern of hemispheric involvement in fine-coarse semantic coding along the time course of activation could be detected in individuals with schizophrenia. Seventeen individuals with schizophrenia and 30 neurotypical participants were presented with two word expressions of four types: literal, conventional metaphoric, unrelated (exemplars of fine semantic coding) and novel metaphoric (an exemplar of coarse semantic coding). Expressions were separated by either a short (250 ms) or long (750 ms) delay. Findings indicate that whereas during novel metaphor processing, controls displayed a left hemisphere advantage at 250 ms delay and right hemisphere advantage at 750 ms, individuals with schizophrenia displayed the opposite. For conventional metaphoric and unrelated expressions, controls showed left hemisphere advantage across times, while individuals with schizophrenia showed a right hemisphere advantage. Furthermore, whereas individuals with schizophrenia were less accurate than control at judging literal, conventional metaphoric and unrelated expressions they were more accurate when judging novel metaphors. Results suggest that individuals with schizophrenia display a reversed pattern of lateralization for semantic coding which causes them to rely more heavily on coarse semantic coding. Thus, for individuals with schizophrenia, speech situation are always non-conventional, compelling them to constantly seek for meanings and prejudicing them toward novel or atypical speech acts. This, in turn, may disadvantage them in conventionalized communication and result in language impairment. Copyright © 2014 Elsevier Ltd. All rights reserved.
Evidence-Based Systematic Review: Effects of Nonspeech Oral Motor Exercises on Speech
ERIC Educational Resources Information Center
McCauley, Rebecca J.; Strand, Edythe; Lof, Gregory L.; Schooling, Tracy; Frymark, Tobi
2009-01-01
Purpose: The purpose of this systematic review was to examine the current evidence for the use of oral motor exercises (OMEs) on speech (i.e., speech physiology, speech production, and functional speech outcomes) as a means of supporting further research and clinicians' use of evidence-based practice. Method: The peer-reviewed literature from 1960…
A Networking of Community-Based Speech Therapy: Borabue District, Maha Sarakham.
Pumnum, Tawitree; Kum-ud, Weawta; Prathanee, Benjamas
2015-08-01
Most children with cleft lip and palate have articulation problems because of compensatory articulation disorders from velopharyngeal insufficiency. Theoretically, children should receive speech therapy from a speech and language pathologist (SLP) 1-2 sessions per week. For developing countries, particularly Thailand, most of them cannot reach standard speech services because of limitation of speech services and SLP Networking of a Community-Based Speech Model might be an appropriate way to solve this problem. To study the effectiveness of a networking of Khon Kaen University (KKU) Community-Based Speech Model, Non Thong Tambon Health Promotion Hospital, Borabue, Maha Sarakham, in decreasing the number of articulation errors for children with CLP. Six children with cleft lip and palate (CLP) who lived in Borabue and the surrounding district, Maha Sarakham, and had medical records in Srinagarind Hospital. They were assessed for pre- and post-articulation errors and provided speech therapy by SLP via teaching on service for speech assistant (SA). Then, children with CLP received speech correction (SC) by SA based on assignment and caregivers practiced home program for a year. Networking of Non Thong Tambon Health Promotion Hospital, Borabue, Maha Sarakham significantly reduce the number of post-articulation errors for 3 children with CLP. There were factors affecting the results in treatment of other children as follows: delayed speech and language development, hypernaslaity, and consistency of SC at local hospital and home. A networking of KKU Community-Based Speech Model, Non Thong Tambon Health Promotion Hospital, Borabue, and Maha Sarakham was a good way to enhance speech therapy in Thailand or other developing countries, where have limitation of speech services or lack of professionals.
Proceedings of the Mobile Satellite Conference
NASA Technical Reports Server (NTRS)
Rafferty, William
1988-01-01
A satellite-based mobile communications system provides voice and data communications to mobile users over a vast geographic area. The technical and service characteristics of mobile satellite systems (MSSs) are presented and form an in-depth view of the current MSS status at the system and subsystem levels. Major emphasis is placed on developments, current and future, in the following critical MSS technology areas: vehicle antennas, networking, modulation and coding, speech compression, channel characterization, space segment technology and MSS experiments. Also, the mobile satellite communications needs of government agencies are addressed, as is the MSS potential to fulfill them.
From Phonemes to Articulatory Codes: An fMRI Study of the Role of Broca's Area in Speech Production
de Zwart, Jacco A.; Jansma, J. Martijn; Pickering, Martin J.; Bednar, James A.; Horwitz, Barry
2009-01-01
We used event-related functional magnetic resonance imaging to investigate the neuroanatomical substrates of phonetic encoding and the generation of articulatory codes from phonological representations. Our focus was on the role of the left inferior frontal gyrus (LIFG) and in particular whether the LIFG plays a role in sublexical phonological processing such as syllabification or whether it is directly involved in phonetic encoding and the generation of articulatory codes. To answer this question, we contrasted the brain activation patterns elicited by pseudowords with high– or low–sublexical frequency components, which we expected would reveal areas related to the generation of articulatory codes but not areas related to phonological encoding. We found significant activation of a premotor network consisting of the dorsal precentral gyrus, the inferior frontal gyrus bilaterally, and the supplementary motor area for low– versus high–sublexical frequency pseudowords. Based on our hypothesis, we concluded that these areas and in particular the LIFG are involved in phonetic and not phonological encoding. We further discuss our findings with respect to the mechanisms of phonetic encoding and provide evidence in support of a functional segregation of the posterior part of Broca's area, the pars opercularis. PMID:19181696
Functional Characterization of the Human Speech Articulation Network.
Basilakos, Alexandra; Smith, Kimberly G; Fillmore, Paul; Fridriksson, Julius; Fedorenko, Evelina
2018-05-01
A number of brain regions have been implicated in articulation, but their precise computations remain debated. Using functional magnetic resonance imaging, we examine the degree of functional specificity of articulation-responsive brain regions to constrain hypotheses about their contributions to speech production. We find that articulation-responsive regions (1) are sensitive to articulatory complexity, but (2) are largely nonoverlapping with nearby domain-general regions that support diverse goal-directed behaviors. Furthermore, premotor articulation regions show selectivity for speech production over some related tasks (respiration control), but not others (nonspeech oral-motor [NSO] movements). This overlap between speech and nonspeech movements concords with electrocorticographic evidence that these regions encode articulators and their states, and with patient evidence whereby articulatory deficits are often accompanied by oral-motor deficits. In contrast, the superior temporal regions show strong selectivity for articulation relative to nonspeech movements, suggesting that these regions play a specific role in speech planning/production. Finally, articulation-responsive portions of posterior inferior frontal gyrus show some selectivity for articulation, in line with the hypothesis that this region prepares an articulatory code that is passed to the premotor cortex. Taken together, these results inform the architecture of the human articulation system.
Yorkston, Kathryn; Baylor, Carolyn; Britton, Deanna
2017-06-22
In this project, we explore the experiences of people who report speech changes associated with Parkinson's disease as they describe taking part in everyday communication situations and report impressions related to speech treatment. Twenty-four community-dwelling adults with Parkinson's disease took part in face-to-face, semistructured interviews. Qualitative research methods were used to code and develop themes related to the interviews. Two major themes emerged. The first, called "speaking," included several subthemes: thinking about speaking, weighing value versus effort, feelings associated with speaking, the environmental context of speaking, and the impact of Parkinson's disease on speaking. The second theme involved "treatment experiences" and included subthemes: choosing not to have treatment, the clinician, drills and exercise, and suggestions for change. From the perspective of participants with Parkinson's disease, speaking is an activity requiring both physical and cognitive effort that takes place in a social context. Although many report positive experiences with speech treatment, some reported dissatisfaction with speech drills and exercises and a lack of focus on the social aspects of communication. Suggestions for improvement include increased focus on the cognitive demands of speaking and on the psychosocial aspects of communication.
Baylor, Carolyn; Britton, Deanna
2017-01-01
Purpose In this project, we explore the experiences of people who report speech changes associated with Parkinson's disease as they describe taking part in everyday communication situations and report impressions related to speech treatment. Method Twenty-four community-dwelling adults with Parkinson's disease took part in face-to-face, semistructured interviews. Qualitative research methods were used to code and develop themes related to the interviews. Results Two major themes emerged. The first, called “speaking,” included several subthemes: thinking about speaking, weighing value versus effort, feelings associated with speaking, the environmental context of speaking, and the impact of Parkinson's disease on speaking. The second theme involved “treatment experiences” and included subthemes: choosing not to have treatment, the clinician, drills and exercise, and suggestions for change. Conclusions From the perspective of participants with Parkinson's disease, speaking is an activity requiring both physical and cognitive effort that takes place in a social context. Although many report positive experiences with speech treatment, some reported dissatisfaction with speech drills and exercises and a lack of focus on the social aspects of communication. Suggestions for improvement include increased focus on the cognitive demands of speaking and on the psychosocial aspects of communication. PMID:28654939
ERIC Educational Resources Information Center
Oliveira, Carla; Lousada, Marisa; Jesus, Luis M. T.
2015-01-01
Children with speech sound disorders (SSD) represent a large number of speech and language therapists' caseloads. The intervention with children who have SSD can involve different therapy approaches, and these may be articulatory or phonologically based. Some international studies reveal a widespread application of articulatory based approaches in…
NASA Astrophysics Data System (ADS)
Jelinek, H. J.
1986-01-01
This is the Final Report of Electronic Design Associates on its Phase I SBIR project. The purpose of this project is to develop a method for correcting helium speech, as experienced in diver-surface communication. The goal of the Phase I study was to design, prototype, and evaluate a real time helium speech corrector system based upon digital signal processing techniques. The general approach was to develop hardware (an IBM PC board) to digitize helium speech and software (a LAMBDA computer based simulation) to translate the speech. As planned in the study proposal, this initial prototype may now be used to assess expected performance from a self contained real time system which uses an identical algorithm. The Final Report details the work carried out to produce the prototype system. Four major project tasks were: a signal processing scheme for converting helium speech to normal sounding speech was generated. The signal processing scheme was simulated on a general purpose (LAMDA) computer. Actual helium speech was supplied to the simulation and the converted speech was generated. An IBM-PC based 14 bit data Input/Output board was designed and built. A bibliography of references on speech processing was generated.
A weighted reliability measure for phonetic transcription.
Oller, D Kimbrough; Ramsdell, Heather L
2006-12-01
The purpose of the present work is to describe and illustrate the utility of a new tool for assessment of transcription agreement. Traditional measures have not characterized overall transcription agreement with sufficient resolution, specifically because they have often treated all phonetic differences between segments in transcriptions as equivalent, thus constituting an unweighted approach to agreement assessment. The measure the authors have developed calculates a weighted transcription agreement value based on principles derived from widely accepted tenets of phonological theory. To investigate the utility of the new measure, 8 coders transcribed samples of speech and infant vocalizations. Comparing the transcriptions through a computer-based implementation of the new weighted and the traditional unweighted measures, they investigated the scaling properties of both. The results illustrate better scaling with the weighted measure, in particular because the weighted measure is not subject to the floor effects that occur with the traditional measure when applied to samples that are difficult to transcribe. Furthermore, the new weighted measure shows orderly relations in degree of agreement across coded samples of early canonical-stage babbling, early meaningful speech in English, and 3 adult languages. The authors conclude that the weighted measure may provide improved foundations for research on phonetic transcription and for monitoring of transcription reliability.
Gauvin, Hanna S; De Baene, Wouter; Brass, Marcel; Hartsuiker, Robert J
2016-02-01
To minimize the number of errors in speech, and thereby facilitate communication, speech is monitored before articulation. It is, however, unclear at which level during speech production monitoring takes place, and what mechanisms are used to detect and correct errors. The present study investigated whether internal verbal monitoring takes place through the speech perception system, as proposed by perception-based theories of speech monitoring, or whether mechanisms independent of perception are applied, as proposed by production-based theories of speech monitoring. With the use of fMRI during a tongue twister task we observed that error detection in internal speech during noise-masked overt speech production and error detection in speech perception both recruit the same neural network, which includes pre-supplementary motor area (pre-SMA), dorsal anterior cingulate cortex (dACC), anterior insula (AI), and inferior frontal gyrus (IFG). Although production and perception recruit similar areas, as proposed by perception-based accounts, we did not find activation in superior temporal areas (which are typically associated with speech perception) during internal speech monitoring in speech production as hypothesized by these accounts. On the contrary, results are highly compatible with a domain general approach to speech monitoring, by which internal speech monitoring takes place through detection of conflict between response options, which is subsequently resolved by a domain general executive center (e.g., the ACC). Copyright © 2015 Elsevier Inc. All rights reserved.
Lopez-Poveda, Enrique A; Eustaquio-Martín, Almudena; Stohl, Joshua S; Wolford, Robert D; Schatzer, Reinhold; Gorospe, José M; Ruiz, Santiago Santa Cruz; Benito, Fernando; Wilson, Blake S
2017-05-01
We have recently proposed a binaural cochlear implant (CI) sound processing strategy inspired by the contralateral medial olivocochlear reflex (the MOC strategy) and shown that it improves intelligibility in steady-state noise (Lopez-Poveda et al., 2016, Ear Hear 37:e138-e148). The aim here was to evaluate possible speech-reception benefits of the MOC strategy for speech maskers, a more natural type of interferer. Speech reception thresholds (SRTs) were measured in six bilateral and two single-sided deaf CI users with the MOC strategy and with a standard (STD) strategy. SRTs were measured in unilateral and bilateral listening conditions, and for target and masker stimuli located at azimuthal angles of (0°, 0°), (-15°, +15°), and (-90°, +90°). Mean SRTs were 2-5 dB better with the MOC than with the STD strategy for spatially separated target and masker sources. For bilateral CI users, the MOC strategy (1) facilitated the intelligibility of speech in competition with spatially separated speech maskers in both unilateral and bilateral listening conditions; and (2) led to an overall improvement in spatial release from masking in the two listening conditions. Insofar as speech is a more natural type of interferer than steady-state noise, the present results suggest that the MOC strategy holds potential for promising outcomes for CI users. Copyright © 2017. Published by Elsevier B.V.
Therapist and Client Interactions in Motivational Interviewing for Social Anxiety Disorder.
Romano, Mia; Arambasic, Jelena; Peters, Lorna
2017-07-01
The aim of the present study is to assess the bidirectional associations between therapist and client speech during a treatment based on motivational interviewing (MI) for social anxiety disorder. Participants were 85 adults diagnosed with social anxiety who received MI prior to entering cognitive behavioral therapy. MI sessions were sequentially coded using the Motivational Interviewing Skill Code 2.5. Therapist MI-consistent behaviors, including open questions as well as positive and negative reflections, were more likely to be followed by client change exploration (change talk and counter-change talk). Therapist MI-inconsistent behaviors were more likely to precede client neutral language. Client language was also found to influence therapist likelihood of responding in an MI-consistent manner. The findings support the first step of the MI causal model in the context of social anxiety and direct future research into the effect of therapist and client behaviors on MI treatment outcome. © 2016 Wiley Periodicals, Inc.
Peaches for Lunch: Creating and Using Visual Variables.
Cartwright, Elizabeth; Clegg, Adam LaVar
2017-01-01
In this article, I describe the process of systematically including nonverbal data in medical anthropology research. I demonstrate the process of visualizing and coding videotaped moments of life and show how we can analyze what is being done along with what is being said. I ground my discussion in toddler language socialization and then expand my observations to the realm of language pathologies. Aphasia from strokes, speech difficulties in neurologically based illnesses like Lou Gehrig's disease, and the variety of communication challenges that face those on the autism spectrum can all be studied in interesting ways by including precise descriptions of nonverbal actions. I discuss the process of recording and coding the data with the software Observer XT 11.5 by Noldus. This method of collecting and analyzing video data can be used for many anthropological questions, in addition to those concerned with communication.
Lai, Ying-Hui; Chen, Fei; Wang, Syu-Siang; Lu, Xugang; Tsao, Yu; Lee, Chin-Hui
2017-07-01
In a cochlear implant (CI) speech processor, noise reduction (NR) is a critical component for enabling CI users to attain improved speech perception under noisy conditions. Identifying an effective NR approach has long been a key topic in CI research. Recently, a deep denoising autoencoder (DDAE) based NR approach was proposed and shown to be effective in restoring clean speech from noisy observations. It was also shown that DDAE could provide better performance than several existing NR methods in standardized objective evaluations. Following this success with normal speech, this paper further investigated the performance of DDAE-based NR to improve the intelligibility of envelope-based vocoded speech, which simulates speech signal processing in existing CI devices. We compared the performance of speech intelligibility between DDAE-based NR and conventional single-microphone NR approaches using the noise vocoder simulation. The results of both objective evaluations and listening test showed that, under the conditions of nonstationary noise distortion, DDAE-based NR yielded higher intelligibility scores than conventional NR approaches. This study confirmed that DDAE-based NR could potentially be integrated into a CI processor to provide more benefits to CI users under noisy conditions.
ERIC Educational Resources Information Center
Adank, Patti
2012-01-01
The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…
Nawaz, Tabassam; Mehmood, Zahid; Rashid, Muhammad; Habib, Hafiz Adnan
2018-01-01
Recent research on speech segregation and music fingerprinting has led to improvements in speech segregation and music identification algorithms. Speech and music segregation generally involves the identification of music followed by speech segregation. However, music segregation becomes a challenging task in the presence of noise. This paper proposes a novel method of speech segregation for unlabelled stationary noisy audio signals using the deep belief network (DBN) model. The proposed method successfully segregates a music signal from noisy audio streams. A recurrent neural network (RNN)-based hidden layer segregation model is applied to remove stationary noise. Dictionary-based fisher algorithms are employed for speech classification. The proposed method is tested on three datasets (TIMIT, MIR-1K, and MusicBrainz), and the results indicate the robustness of proposed method for speech segregation. The qualitative and quantitative analysis carried out on three datasets demonstrate the efficiency of the proposed method compared to the state-of-the-art speech segregation and classification-based methods. PMID:29558485
Non-US data compression and coding research. FASAC Technical Assessment Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gray, R.M.; Cohn, M.; Craver, L.W.
1993-11-01
This assessment of recent data compression and coding research outside the United States examines fundamental and applied work in the basic areas of signal decomposition, quantization, lossless compression, and error control, as well as application development efforts in image/video compression and speech/audio compression. Seven computer scientists and engineers who are active in development of these technologies in US academia, government, and industry carried out the assessment. Strong industrial and academic research groups in Western Europe, Israel, and the Pacific Rim are active in the worldwide search for compression algorithms that provide good tradeoffs among fidelity, bit rate, and computational complexity,more » though the theoretical roots and virtually all of the classical compression algorithms were developed in the United States. Certain areas, such as segmentation coding, model-based coding, and trellis-coded modulation, have developed earlier or in more depth outside the United States, though the United States has maintained its early lead in most areas of theory and algorithm development. Researchers abroad are active in other currently popular areas, such as quantizer design techniques based on neural networks and signal decompositions based on fractals and wavelets, but, in most cases, either similar research is or has been going on in the United States, or the work has not led to useful improvements in compression performance. Because there is a high degree of international cooperation and interaction in this field, good ideas spread rapidly across borders (both ways) through international conferences, journals, and technical exchanges. Though there have been no fundamental data compression breakthroughs in the past five years--outside or inside the United State--there have been an enormous number of significant improvements in both places in the tradeoffs among fidelity, bit rate, and computational complexity.« less
The Speech multi features fusion perceptual hash algorithm based on tensor decomposition
NASA Astrophysics Data System (ADS)
Huang, Y. B.; Fan, M. H.; Zhang, Q. Y.
2018-03-01
With constant progress in modern speech communication technologies, the speech data is prone to be attacked by the noise or maliciously tampered. In order to make the speech perception hash algorithm has strong robustness and high efficiency, this paper put forward a speech perception hash algorithm based on the tensor decomposition and multi features is proposed. This algorithm analyses the speech perception feature acquires each speech component wavelet packet decomposition. LPCC, LSP and ISP feature of each speech component are extracted to constitute the speech feature tensor. Speech authentication is done by generating the hash values through feature matrix quantification which use mid-value. Experimental results showing that the proposed algorithm is robust for content to maintain operations compared with similar algorithms. It is able to resist the attack of the common background noise. Also, the algorithm is highly efficiency in terms of arithmetic, and is able to meet the real-time requirements of speech communication and complete the speech authentication quickly.
[Verbal and gestural communication in interpersonal interaction with Alzheimer's disease patients].
Schiaratura, Loris Tamara; Di Pastena, Angela; Askevis-Leherpeux, Françoise; Clément, Sylvain
2015-03-01
Communication can be defined as a verbal and non verbal exchange of thoughts and emotions. While verbal communication deficit in Alzheimer's disease is well documented, very little is known about gestural communication, especially in interpersonal situations. This study examines the production of gestures and its relations with verbal aspects of communication. Three patients suffering from moderately severe Alzheimer's disease were compared to three healthy adults. Each one were given a series of pictures and asked to explain which one she preferred and why. The interpersonal interaction was video recorded. Analyses concerned verbal production (quantity and quality) and gestures. Gestures were either non representational (i.e., gestures of small amplitude punctuating speech or accentuating some parts of utterance) or representational (i.e., referring to the object of the speech). Representational gestures were coded as iconic (depicting of concrete aspects), metaphoric (depicting of abstract meaning) or deictic (pointing toward an object). In comparison with healthy participants, patients revealed a decrease in quantity and quality of speech. Nevertheless, their production of gestures was always present. This pattern is in line with the conception that gestures and speech depend on different communicational systems and look inconsistent with the assumption of a parallel dissolution of gesture and speech. Moreover, analyzing the articulation between verbal and gestural dimensions suggests that representational gestures may compensate for speech deficits. It underlines the importance for the role of gestures in maintaining interpersonal communication.
Tao, Duoduo; Deng, Rui; Jiang, Ye; Galvin, John J; Fu, Qian-Jie; Chen, Bing
2014-01-01
To investigate how auditory working memory relates to speech perception performance by Mandarin-speaking cochlear implant (CI) users. Auditory working memory and speech perception was measured in Mandarin-speaking CI and normal-hearing (NH) participants. Working memory capacity was measured using forward digit span and backward digit span; working memory efficiency was measured using articulation rate. Speech perception was assessed with: (a) word-in-sentence recognition in quiet, (b) word-in-sentence recognition in speech-shaped steady noise at +5 dB signal-to-noise ratio, (c) Chinese disyllable recognition in quiet, (d) Chinese lexical tone recognition in quiet. Self-reported school rank was also collected regarding performance in schoolwork. There was large inter-subject variability in auditory working memory and speech performance for CI participants. Working memory and speech performance were significantly poorer for CI than for NH participants. All three working memory measures were strongly correlated with each other for both CI and NH participants. Partial correlation analyses were performed on the CI data while controlling for demographic variables. Working memory efficiency was significantly correlated only with sentence recognition in quiet when working memory capacity was partialled out. Working memory capacity was correlated with disyllable recognition and school rank when efficiency was partialled out. There was no correlation between working memory and lexical tone recognition in the present CI participants. Mandarin-speaking CI users experience significant deficits in auditory working memory and speech performance compared with NH listeners. The present data suggest that auditory working memory may contribute to CI users' difficulties in speech understanding. The present pattern of results with Mandarin-speaking CI users is consistent with previous auditory working memory studies with English-speaking CI users, suggesting that the lexical importance of voice pitch cues (albeit poorly coded by the CI) did not influence the relationship between working memory and speech perception.
The Effects of Word Length on Memory for Pictures: Evidence for Speech Coding in Young Children.
ERIC Educational Resources Information Center
Hulme, Charles; And Others
1986-01-01
Three experiments demonstrate that children four to ten years old, when presented with a series recall task with pictures of common objects having short or long names, showed consistently better recall of pictures with short names. (HOD)
Texting while driving: is speech-based text entry less risky than handheld text entry?
He, J; Chaparro, A; Nguyen, B; Burge, R J; Crandall, J; Chaparro, B; Ni, R; Cao, S
2014-11-01
Research indicates that using a cell phone to talk or text while maneuvering a vehicle impairs driving performance. However, few published studies directly compare the distracting effects of texting using a hands-free (i.e., speech-based interface) versus handheld cell phone, which is an important issue for legislation, automotive interface design and driving safety training. This study compared the effect of speech-based versus handheld text entries on simulated driving performance by asking participants to perform a car following task while controlling the duration of a secondary text-entry task. Results showed that both speech-based and handheld text entries impaired driving performance relative to the drive-only condition by causing more variation in speed and lane position. Handheld text entry also increased the brake response time and increased variation in headway distance. Text entry using a speech-based cell phone was less detrimental to driving performance than handheld text entry. Nevertheless, the speech-based text entry task still significantly impaired driving compared to the drive-only condition. These results suggest that speech-based text entry disrupts driving, but reduces the level of performance interference compared to text entry with a handheld device. In addition, the difference in the distraction effect caused by speech-based and handheld text entry is not simply due to the difference in task duration. Copyright © 2014 Elsevier Ltd. All rights reserved.
Sherratt, Sue; Worrall, Linda; Pearson, Charlene; Howe, Tami; Hersh, Deborah; Davidson, Bronwyn
2011-08-01
Goal-setting is considered an essential part of rehabilitation practice and integral to person-centredness. However, people with aphasia are not always satisfied with goal-setting, and speech-language pathologists are concerned about the appropriateness of therapy. Furthermore, family members are often excluded from goal-setting, despite the impact aphasia has on them. The actual goals set by clinicians for clients with aphasia and their family members have not yet been investigated. This study aimed to examine the goals that clinicians set for their clients with aphasia and their family members. Data from in-depth interviews with 34 speech-language pathologists describing 84 goal-setting experiences with people with aphasia were coded into superordinate goals for both groups. Clinicians expressed a wide range of goals for people with aphasia and their family members, relating to communication, coping and participation factors, and education. In addition, evaluation was considered a goal for the clients. There were clients for whom no goals were set, particularly for family members, due to a lack of/limited contact. The goals described broadly addressed all aspects of the International Classification of Functioning, Disability and Health (ICF) and reflected the use of both functional and impairment-based therapeutic approaches; they also emphasize the importance of providing goal-setting options for the family members of these clients.
Automated Discovery of Speech Act Categories in Educational Games
ERIC Educational Resources Information Center
Rus, Vasile; Moldovan, Cristian; Niraula, Nobal; Graesser, Arthur C.
2012-01-01
In this paper we address the important task of automated discovery of speech act categories in dialogue-based, multi-party educational games. Speech acts are important in dialogue-based educational systems because they help infer the student speaker's intentions (the task of speech act classification) which in turn is crucial to providing adequate…
The Suitability of Cloud-Based Speech Recognition Engines for Language Learning
ERIC Educational Resources Information Center
Daniels, Paul; Iwago, Koji
2017-01-01
As online automatic speech recognition (ASR) engines become more accurate and more widely implemented with call software, it becomes important to evaluate the effectiveness and the accuracy of these recognition engines using authentic speech samples. This study investigates two of the most prominent cloud-based speech recognition engines--Apple's…
Speech Correction for Children with Cleft Lip and Palate by Networking of Community-Based Care.
Hanchanlert, Yotsak; Pramakhatay, Worawat; Pradubwong, Suteera; Prathanee, Benjamas
2015-08-01
Prevalence of cleft lip and palate (CLP) is high in Northeast Thailand. Most children with CLP face many problems, particularly compensatory articulation disorders (CAD) beyond surgery while speech services and the number of speech and language pathologists (SLPs) are limited. To determine the effectiveness of networking of Khon Kaen University (KKU) Community-Based Speech Therapy Model: Kosumphisai Hospital, Kosumphisai District and Maha Sarakham Hospital, Mueang District, Maha Sarakham Province for reduction of the number of articulations errors for children with CLP. Eleven children with CLP were recruited in 3 1-year projects of KKU Community-Based Speech Therapy Model. Articulation tests were formally assessed by qualified language pathologists (SLPs) for baseline and post treatment outcomes. Teachings on services for speech assistants (SAs) were conducted by SLPs. Assigned speech correction (SC) was performed by SAs at home and at local hospitals. Caregivers also gave SC at home 3-4 days a week. Networking of Community-Based Speech Therapy Model signficantly reduced the number of articulation errors for children with CLP in both word and sentence levels (mean difference = 6.91, 95% confidence interval = 4.15-9.67; mean difference = 5.36, 95% confidence interval = 2.99-7.73, respectively). Networking by Kosumphisai and Maha Sarakham of KKU Community-Based Speech Therapy Model was a valid and efficient method for providing speech services for children with cleft palate and could be extended to any area in Thailand and other developing countries, where have similar contexts.
NASA Astrophysics Data System (ADS)
Dat, Tran Huy; Takeda, Kazuya; Itakura, Fumitada
We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.
Lee, Shao-Hsuan; Hsiao, Tzu-Yu; Lee, Guo-She
2015-06-01
Sustained vocalizations of vowels [a], [i], and syllable [mə] were collected in twenty normal-hearing individuals. On vocalizations, five conditions of different audio-vocal feedback were introduced separately to the speakers including no masking, wearing supra-aural headphones only, speech-noise masking, high-pass noise masking, and broad-band-noise masking. Power spectral analysis of vocal fundamental frequency (F0) was used to evaluate the modulations of F0 and linear-predictive-coding was used to acquire first two formants. The results showed that while the formant frequencies were not significantly shifted, low-frequency modulations (<3 Hz) of F0 significantly increased with reduced audio-vocal feedback across speech sounds and were significantly correlated with auditory awareness of speakers' own voices. For sustained speech production, the motor speech controls on F0 may depend on a feedback mechanism while articulation should rely more on a feedforward mechanism. Power spectral analysis of F0 might be applied to evaluate audio-vocal control for various hearing and neurological disorders in the future. Copyright © 2015 Elsevier B.V. All rights reserved.
Tavano, Alessandro; Pesarin, Anna; Murino, Vittorio; Cristani, Marco
2014-01-01
Individuals with Asperger syndrome/High Functioning Autism fail to spontaneously attribute mental states to the self and others, a life-long phenotypic characteristic known as mindblindness. We hypothesized that mindblindness would affect the dynamics of conversational interaction. Using generative models, in particular Gaussian mixture models and observed influence models, conversations were coded as interacting Markov processes, operating on novel speech/silence patterns, termed Steady Conversational Periods (SCPs). SCPs assume that whenever an agent's process changes state (e.g., from silence to speech), it causes a general transition of the entire conversational process, forcing inter-actant synchronization. SCPs fed into observed influence models, which captured the conversational dynamics of children and adolescents with Asperger syndrome/High Functioning Autism, and age-matched typically developing participants. Analyzing the parameters of the models by means of discriminative classifiers, the dialogs of patients were successfully distinguished from those of control participants. We conclude that meaning-free speech/silence sequences, reflecting inter-actant synchronization, at least partially encode typical and atypical conversational dynamics. This suggests a direct influence of theory of mind abilities onto basic speech initiative behavior.
Rowa, Karen; Paulitzki, Jeffrey R; Ierullo, Maria D; Chiang, Brenda; Antony, Martin M; McCabe, Randi E; Moscovitch, David A
2015-05-01
In the current study, 55 participants with a diagnosis of generalized social anxiety disorder (SAD), 23 participants with a diagnosis of an anxiety disorder other than SAD with no comorbid SAD, and 50 healthy controls completed a speech task as well as self-reported measures of safety behavior use. Speeches were videotaped and coded for global and specific indicators of performance by two raters who were blind to participants' diagnostic status. Results suggested that the objective performance of people with SAD was poorer than that of both control groups, who did not differ from each other. Moreover, self-reported use of safety behaviors during the speech strongly mediated the relationship between diagnostic group and observers' performance ratings. These results are consistent with contemporary cognitive-behavioral and interpersonal models of SAD and suggest that socially anxious individuals' performance skills may be undermined by the use of safety behaviors. These data provide further support for recommendations from previous studies that the elimination of safety behaviors ought to be a priority in cognitive behavioral therapy for SAD. Copyright © 2014. Published by Elsevier Ltd.
Spatiotemporal dynamics of auditory attention synchronize with speech
Wöstmann, Malte; Herrmann, Björn; Maess, Burkhard
2016-01-01
Attention plays a fundamental role in selectively processing stimuli in our environment despite distraction. Spatial attention induces increasing and decreasing power of neural alpha oscillations (8–12 Hz) in brain regions ipsilateral and contralateral to the locus of attention, respectively. This study tested whether the hemispheric lateralization of alpha power codes not just the spatial location but also the temporal structure of the stimulus. Participants attended to spoken digits presented to one ear and ignored tightly synchronized distracting digits presented to the other ear. In the magnetoencephalogram, spatial attention induced lateralization of alpha power in parietal, but notably also in auditory cortical regions. This alpha power lateralization was not maintained steadily but fluctuated in synchrony with the speech rate and lagged the time course of low-frequency (1–5 Hz) sensory synchronization. Higher amplitude of alpha power modulation at the speech rate was predictive of a listener’s enhanced performance of stream-specific speech comprehension. Our findings demonstrate that alpha power lateralization is modulated in tune with the sensory input and acts as a spatiotemporal filter controlling the read-out of sensory content. PMID:27001861
Evaluation of the importance of time-frequency contributions to speech intelligibility in noise
Yu, Chengzhu; Wójcicki, Kamil K.; Loizou, Philipos C.; Hansen, John H. L.; Johnson, Michael T.
2014-01-01
Recent studies on binary masking techniques make the assumption that each time-frequency (T-F) unit contributes an equal amount to the overall intelligibility of speech. The present study demonstrated that the importance of each T-F unit to speech intelligibility varies in accordance with speech content. Specifically, T-F units are categorized into two classes, speech-present T-F units and speech-absent T-F units. Results indicate that the importance of each speech-present T-F unit to speech intelligibility is highly related to the loudness of its target component, while the importance of each speech-absent T-F unit varies according to the loudness of its masker component. Two types of mask errors are also considered, which include miss and false alarm errors. Consistent with previous work, false alarm errors are shown to be more harmful to speech intelligibility than miss errors when the mixture signal-to-noise ratio (SNR) is below 0 dB. However, the relative importance between the two types of error is conditioned on the SNR level of the input speech signal. Based on these observations, a mask-based objective measure, the loudness weighted hit-false, is proposed for predicting speech intelligibility. The proposed objective measure shows significantly higher correlation with intelligibility compared to two existing mask-based objective measures. PMID:24815280
Measuring Syntactic Complexity in Spontaneous Spoken Swedish
ERIC Educational Resources Information Center
Roll, Mikael; Frid, Johan; Horne, Merle
2007-01-01
Hesitation disfluencies after phonetically prominent stranded function words are thought to reflect the cognitive coding of complex structures. Speech fragments following the Swedish function word "att" "that" were analyzed syntactically, and divided into two groups: one with "att" in disfluent contexts, and the other with "att" in fluent…
NASA Astrophysics Data System (ADS)
Lightstone, P. C.; Davidson, W. M.
1982-04-01
The military detection assessment laboratory houses an experimental field system which assesses different alarm indicators such as fence disturbance sensors, MILES cables, and microwave Racons. A speech synthesis board which could be interfaced, by means of a computer, to an alarm logger making verbal acknowledgement of alarms possible was purchased. Different products and different types of voice synthesis were analyzed before a linear predictive code device produced by Telesensory Speech Systems of Palo Alto, California was chosen. This device is called the Speech 1000 Board and has a dedicated 8085 processor. A multiplexer card was designed and the Sp 1000 interfaced through the card into a TMS 990/100M Texas Instrument microcomputer. It was also necessary to design the software with the capability of recognizing and flagging an alarm on any 1 of 32 possible lines. The experimental field system was then packaged with a dc power supply, LED indicators, speakers, and switches, and deployed in the field performing reliably.
Perceptual learning of degraded speech by minimizing prediction error.
Sohoglu, Ediz; Davis, Matthew H
2016-03-22
Human perception is shaped by past experience on multiple timescales. Sudden and dramatic changes in perception occur when prior knowledge or expectations match stimulus content. These immediate effects contrast with the longer-term, more gradual improvements that are characteristic of perceptual learning. Despite extensive investigation of these two experience-dependent phenomena, there is considerable debate about whether they result from common or dissociable neural mechanisms. Here we test single- and dual-mechanism accounts of experience-dependent changes in perception using concurrent magnetoencephalographic and EEG recordings of neural responses evoked by degraded speech. When speech clarity was enhanced by prior knowledge obtained from matching text, we observed reduced neural activity in a peri-auditory region of the superior temporal gyrus (STG). Critically, longer-term improvements in the accuracy of speech recognition following perceptual learning resulted in reduced activity in a nearly identical STG region. Moreover, short-term neural changes caused by prior knowledge and longer-term neural changes arising from perceptual learning were correlated across subjects with the magnitude of learning-induced changes in recognition accuracy. These experience-dependent effects on neural processing could be dissociated from the neural effect of hearing physically clearer speech, which similarly enhanced perception but increased rather than decreased STG responses. Hence, the observed neural effects of prior knowledge and perceptual learning cannot be attributed to epiphenomenal changes in listening effort that accompany enhanced perception. Instead, our results support a predictive coding account of speech perception; computational simulations show how a single mechanism, minimization of prediction error, can drive immediate perceptual effects of prior knowledge and longer-term perceptual learning of degraded speech.
Perceptual learning of degraded speech by minimizing prediction error
Sohoglu, Ediz
2016-01-01
Human perception is shaped by past experience on multiple timescales. Sudden and dramatic changes in perception occur when prior knowledge or expectations match stimulus content. These immediate effects contrast with the longer-term, more gradual improvements that are characteristic of perceptual learning. Despite extensive investigation of these two experience-dependent phenomena, there is considerable debate about whether they result from common or dissociable neural mechanisms. Here we test single- and dual-mechanism accounts of experience-dependent changes in perception using concurrent magnetoencephalographic and EEG recordings of neural responses evoked by degraded speech. When speech clarity was enhanced by prior knowledge obtained from matching text, we observed reduced neural activity in a peri-auditory region of the superior temporal gyrus (STG). Critically, longer-term improvements in the accuracy of speech recognition following perceptual learning resulted in reduced activity in a nearly identical STG region. Moreover, short-term neural changes caused by prior knowledge and longer-term neural changes arising from perceptual learning were correlated across subjects with the magnitude of learning-induced changes in recognition accuracy. These experience-dependent effects on neural processing could be dissociated from the neural effect of hearing physically clearer speech, which similarly enhanced perception but increased rather than decreased STG responses. Hence, the observed neural effects of prior knowledge and perceptual learning cannot be attributed to epiphenomenal changes in listening effort that accompany enhanced perception. Instead, our results support a predictive coding account of speech perception; computational simulations show how a single mechanism, minimization of prediction error, can drive immediate perceptual effects of prior knowledge and longer-term perceptual learning of degraded speech. PMID:26957596
How reading differs from object naming at the neuronal level.
Price, C J; McCrory, E; Noppeney, U; Mechelli, A; Moore, C J; Biggio, N; Devlin, J T
2006-01-15
This paper uses whole brain functional neuroimaging in neurologically normal participants to explore how reading aloud differs from object naming in terms of neuronal implementation. In the first experiment, we directly compared brain activation during reading aloud and object naming. This revealed greater activation for reading in bilateral premotor, left posterior superior temporal and precuneus regions. In a second experiment, we segregated the object-naming system into object recognition and speech production areas by factorially manipulating the presence or absence of objects (pictures of objects or their meaningless scrambled counterparts) with the presence or absence of speech production (vocal vs. finger press responses). This demonstrated that the areas associated with speech production (object naming and repetitively saying "OK" to meaningless scrambled pictures) corresponded exactly to the areas where responses were higher for reading aloud than object naming in Experiment 1. Collectively the results suggest that, relative to object naming, reading increases the demands on shared speech production processes. At a cognitive level, enhanced activation for reading in speech production areas may reflect the multiple and competing phonological codes that are generated from the sublexical parts of written words. At a neuronal level, it may reflect differences in the speed with which different areas are activated and integrate with one another.
NASA Astrophysics Data System (ADS)
Liberman, A. M.
1983-09-01
This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts cover the following topics: The association between comprehension of spoken sentences and early reading ability: The role of phonetic representation; Phonetic coding and order memory in relation to reading proficiency: A comparison of short-term memory for temporal and spatial order information; Exploring the oral and written language errors made by language disabled children; Perceiving phonetic events; Converging evidence in support of common dynamical principles for speech and movement coordination; Phase transitions and critical behavior in human bimanual coordination; Timing and coarticulation for alveolo-palatals and sequences of alveolar +J in Catalan; V-to-C coarticulation in Catalan VCV sequences: An articulatory and acoustical study; Prosody and the /S/-/c/ distinction; Intersections of tone and intonation in Thai; Simultaneous measurements of vowels produced by a hearing-impaired speaker; Extending format transitions may not improve aphasics' perception of stop consonant place of articulation; Against a role of chirp identification in duplex perception; Further evidence for the role of relative timing in speech: A reply to Barry; Review (Phonological intervention: Concepts and procedures); and Review (Temporal variables in speech).
Temporal order processing of syllables in the left parietal lobe.
Moser, Dana; Baker, Julie M; Sanchez, Carmen E; Rorden, Chris; Fridriksson, Julius
2009-10-07
Speech processing requires the temporal parsing of syllable order. Individuals suffering from posterior left hemisphere brain injury often exhibit temporal processing deficits as well as language deficits. Although the right posterior inferior parietal lobe has been implicated in temporal order judgments (TOJs) of visual information, there is limited evidence to support the role of the left inferior parietal lobe (IPL) in processing syllable order. The purpose of this study was to examine whether the left inferior parietal lobe is recruited during temporal order judgments of speech stimuli. Functional magnetic resonance imaging data were collected on 14 normal participants while they completed the following forced-choice tasks: (1) syllable order of multisyllabic pseudowords, (2) syllable identification of single syllables, and (3) gender identification of both multisyllabic and monosyllabic speech stimuli. Results revealed increased neural recruitment in the left inferior parietal lobe when participants made judgments about syllable order compared with both syllable identification and gender identification. These findings suggest that the left inferior parietal lobe plays an important role in processing syllable order and support the hypothesized role of this region as an interface between auditory speech and the articulatory code. Furthermore, a breakdown in this interface may explain some components of the speech deficits observed after posterior damage to the left hemisphere.
Temporal Order Processing of Syllables in the Left Parietal Lobe
Baker, Julie M.; Sanchez, Carmen E.; Rorden, Chris; Fridriksson, Julius
2009-01-01
Speech processing requires the temporal parsing of syllable order. Individuals suffering from posterior left hemisphere brain injury often exhibit temporal processing deficits as well as language deficits. Although the right posterior inferior parietal lobe has been implicated in temporal order judgments (TOJs) of visual information, there is limited evidence to support the role of the left inferior parietal lobe (IPL) in processing syllable order. The purpose of this study was to examine whether the left inferior parietal lobe is recruited during temporal order judgments of speech stimuli. Functional magnetic resonance imaging data were collected on 14 normal participants while they completed the following forced-choice tasks: (1) syllable order of multisyllabic pseudowords, (2) syllable identification of single syllables, and (3) gender identification of both multisyllabic and monosyllabic speech stimuli. Results revealed increased neural recruitment in the left inferior parietal lobe when participants made judgments about syllable order compared with both syllable identification and gender identification. These findings suggest that the left inferior parietal lobe plays an important role in processing syllable order and support the hypothesized role of this region as an interface between auditory speech and the articulatory code. Furthermore, a breakdown in this interface may explain some components of the speech deficits observed after posterior damage to the left hemisphere. PMID:19812331
Women's Speech/Men's Speech: Does Forensic Training Make a Difference?
ERIC Educational Resources Information Center
Larson, Suzanne; Vreeland, Amy L.
A study of cross examination speeches of males and females was conducted to determine gender differences in intercollegiate debate. The theory base for gender differences in speech is closely tied to the analysis of dyadic conversation. It is based on the belief that women are less forceful and dominant in cross examination, and will exhibit…
Governing sexual behaviour through humanitarian codes of conduct.
Matti, Stephanie
2015-10-01
Since 2001, there has been a growing consensus that sexual exploitation and abuse of intended beneficiaries by humanitarian workers is a real and widespread problem that requires governance. Codes of conduct have been promoted as a key mechanism for governing the sexual behaviour of humanitarian workers and, ultimately, preventing sexual exploitation and abuse (PSEA). This article presents a systematic study of PSEA codes of conduct adopted by humanitarian non-governmental organisations (NGOs) and how they govern the sexual behaviour of humanitarian workers. It draws on Foucault's analytics of governance and speech act theory to examine the findings of a survey of references to codes of conduct made on the websites of 100 humanitarian NGOs, and to analyse some features of the organisation-specific PSEA codes identified. © 2015 The Author(s). Disasters © Overseas Development Institute, 2015.
Use of suprathreshold stochastic resonance in cochlear implant coding
NASA Astrophysics Data System (ADS)
Allingham, David; Stocks, Nigel G.; Morse, Robert P.
2003-05-01
In this article we discuss the possible use of a novel form of stochastic resonance, termed suprathreshold stochastic resonance (SSR), to improve signal encoding/transmission in cochlear implants. A model, based on the leaky-integrate-and-fire (LIF) neuron, has been developed from physiological data and use to model information flow in a population of cochlear nerve fibers. It is demonstrated that information flow can, in principle, be enhanced by the SSR effect. Furthermore, SSR was found to enhance information transmission for signal parameters that are commonly encountered in cochlear implants. This, therefore, gives hope that SSR may be implemented in cochlear implants to improve speech comprehension.
Application of artifical intelligence principles to the analysis of "crazy" speech.
Garfield, D A; Rapp, C
1994-04-01
Artificial intelligence computer simulation methods can be used to investigate psychotic or "crazy" speech. Here, symbolic reasoning algorithms establish semantic networks that schematize speech. These semantic networks consist of two main structures: case frames and object taxonomies. Node-based reasoning rules apply to object taxonomies and pathway-based reasoning rules apply to case frames. Normal listeners may recognize speech as "crazy talk" based on violations of node- and pathway-based reasoning rules. In this article, three separate segments of schizophrenic speech illustrate violations of these rules. This artificial intelligence approach is compared and contrasted with other neurolinguistic approaches and is discussed as a conceptual link between neurobiological and psychodynamic understandings of psychopathology.
Cheung, Gladys; Trembath, David; Arciuli, Joanne; Togher, Leanne
2013-08-01
Although researchers have examined barriers to implementing evidence-based practice (EBP) at the level of the individual, little is known about the effects workplaces have on speech-language pathologists' implementation of EBP. The aim of this study was to examine the impact of workplace factors on the use of EBP amongst speech-language pathologists who work with children with Autism Spectrum Disorder (ASD). This study sought to (a) explore views about EBP amongst speech-language pathologists who work with children with ASD, (b) identify workplace factors which, in the participants' opinions, acted as barriers or enablers to their provision of evidence-based speech-language pathology services, and (c) examine whether or not speech-language pathologists' responses to workplace factors differed based on the type of workplace or their years of experience. A total of 105 speech-language pathologists from across Australia completed an anonymous online questionnaire. The results indicate that, although the majority of speech-language pathologists agreed that EBP is necessary, they experienced barriers to their implementation of EBP including workplace culture and support, lack of time, cost of EBP, and the availability and accessibility of EBP resources. The barriers reported by speech-language pathologists were similar, regardless of their workplace (private practice vs organization) and years of experience.
Masking of errors in transmission of VAPC-coded speech
NASA Technical Reports Server (NTRS)
Cox, Neil B.; Froese, Edwin L.
1990-01-01
A subjective evaluation is provided of the bit error sensitivity of the message elements of a Vector Adaptive Predictive (VAPC) speech coder, along with an indication of the amenability of these elements to a popular error masking strategy (cross frame hold over). As expected, a wide range of bit error sensitivity was observed. The most sensitive message components were the short term spectral information and the most significant bits of the pitch and gain indices. The cross frame hold over strategy was found to be useful for pitch and gain information, but it was not beneficial for the spectral information unless severe corruption had occurred.
Long short-term memory for speaker generalization in supervised speech separation
Chen, Jitong; Wang, DeLiang
2017-01-01
Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker- and noise-independent speech separation. PMID:28679261
Speech enhancement based on modified phase-opponency detectors
NASA Astrophysics Data System (ADS)
Deshmukh, Om D.; Espy-Wilson, Carol Y.
2005-09-01
A speech enhancement algorithm based on a neural model was presented by Deshmukh et al., [149th meeting of the Acoustical Society America, 2005]. The algorithm consists of a bank of Modified Phase Opponency (MPO) filter pairs tuned to different center frequencies. This algorithm is able to enhance salient spectral features in speech signals even at low signal-to-noise ratios. However, the algorithm introduces musical noise and sometimes misses a spectral peak that is close in frequency to a stronger spectral peak. Refinement in the design of the MPO filters was recently made that takes advantage of the falling spectrum of the speech signal in sonorant regions. The modified set of filters leads to better separation of the noise and speech signals, and more accurate enhancement of spectral peaks. The improvements also lead to a significant reduction in musical noise. Continuity algorithms based on the properties of speech signals are used to further reduce the musical noise effect. The efficiency of the proposed method in enhancing the speech signal when the level of the background noise is fluctuating will be demonstrated. The performance of the improved speech enhancement method will be compared with various spectral subtraction-based methods. [Work supported by NSF BCS0236707.
ERIC Educational Resources Information Center
Lewy, Guenter
2018-01-01
Freedom of expression is imperiled on today's college campuses. Citizens and educators alike are concerned about the number of shout-downs and disinvitations and their silencing effect on intellectual diversity. The use of speech codes, "safe spaces," new rules demanding "trigger warnings," and condemning…
ERIC Educational Resources Information Center
Gould, Jon B.
2007-01-01
Last December saw another predictable report from the Foundation for Individual Rights in Education (FIRE), a self-described watchdog group, highlighting how higher education is supposedly under siege from a politically correct plague of so-called hate-speech codes. In that report, FIRE declared that as many as 96 percent of top-ranked colleges…
ERIC Educational Resources Information Center
Stiles, William B.; And Others
1983-01-01
Coded campaign speeches recorded during the 1980 American presidential primaries and college lectures using a taxonomy of verbal response modes. Both candidates and lecturers used mostly informative modes, but candidates used relatively more disclosures (subjective information) and fewer edifications (objective information). Candidates…
ERIC Educational Resources Information Center
Gray, Mary W.
1994-01-01
Sexual harassment is abuse of power. It should be prohibited in colleges and universities, not through constraints on academic freedom such as speech codes, but through enforcement of standards of ethical professional conduct. Faculty have an ethical obligation not to engage in harassment and to hold colleagues accountable if they do so. (MSE)
English in Political Discourse of Post-Suharto Indonesia.
ERIC Educational Resources Information Center
Bernsten, Suzanne
This paper illustrates increases in the use of English in political speeches in post-Suharto Indonesia by analyzing the phonological, morphological, and syntactic assimilation of loanwords (linguistic borrowing), as well as hybridization and code switching, and phenomena such as doubling and loan translations. The paper also examines the mixed…
The Courts as Educational Policy Makers.
ERIC Educational Resources Information Center
Maready, William F.
This report discusses the expanding role of Federal judges as educational policymakers. The report discusses court decisions related to interpretations by the Federal Courts of the U.S. Constitution. The report notes that court decisions have covered the following topics: dress codes, flying of the flag, freedom of speech, unwed mothers,…
Spanish-English Speech Perception in Children and Adults: Developmental Trends
ERIC Educational Resources Information Center
Brice, Alejandro E.; Gorman, Brenda K.; Leung, Cynthia B.
2013-01-01
This study explored the developmental trends and phonetic category formation in bilingual children and adults. Participants included 30 fluent Spanish-English bilingual children, aged 8-11, and bilingual adults, aged 18-40. All completed gating tasks that incorporated code-mixed Spanish-English stimuli. There were significant differences in…
Student Disciplinary Codes -- What Makes Them Tick.
ERIC Educational Resources Information Center
Johnson, Donald V.
In this speech, the author describes how one school developed discipline guidelines with the cooperation of staff, parents, and students. Due process procedures, types of discipline, and an alternative out-of-school program for adjustment students (those who have experienced chronic or serious disciplinary problems in the school) are described.…
Development of a speech autocuer
NASA Astrophysics Data System (ADS)
Bedles, R. L.; Kizakvich, P. N.; Lawson, D. T.; McCartney, M. L.
1980-12-01
A wearable, visually based prosthesis for the deaf based upon the proven method for removing lipreading ambiguity known as cued speech was fabricated and tested. Both software and hardware developments are described, including a microcomputer, display, and speech preprocessor.
Development of a speech autocuer
NASA Technical Reports Server (NTRS)
Bedles, R. L.; Kizakvich, P. N.; Lawson, D. T.; Mccartney, M. L.
1980-01-01
A wearable, visually based prosthesis for the deaf based upon the proven method for removing lipreading ambiguity known as cued speech was fabricated and tested. Both software and hardware developments are described, including a microcomputer, display, and speech preprocessor.
Job Stress of School-Based Speech-Language Pathologists
ERIC Educational Resources Information Center
Harris, Stephanie Ferney; Prater, Mary Anne; Dyches, Tina Taylor; Heath, Melissa Allen
2009-01-01
Stress and burnout contribute significantly to the shortages of school-based speech-language pathologists (SLPs). At the request of the Utah State Office of Education, the researchers measured the stress levels of 97 school-based SLPs using the "Speech-Language Pathologist Stress Inventory." Results indicated that participants' emotional-fatigue…
Predictive top-down integration of prior knowledge during speech perception.
Sohoglu, Ediz; Peelle, Jonathan E; Carlyon, Robert P; Davis, Matthew H
2012-06-20
A striking feature of human perception is that our subjective experience depends not only on sensory information from the environment but also on our prior knowledge or expectations. The precise mechanisms by which sensory information and prior knowledge are integrated remain unclear, with longstanding disagreement concerning whether integration is strictly feedforward or whether higher-level knowledge influences sensory processing through feedback connections. Here we used concurrent EEG and MEG recordings to determine how sensory information and prior knowledge are integrated in the brain during speech perception. We manipulated listeners' prior knowledge of speech content by presenting matching, mismatching, or neutral written text before a degraded (noise-vocoded) spoken word. When speech conformed to prior knowledge, subjective perceptual clarity was enhanced. This enhancement in clarity was associated with a spatiotemporal profile of brain activity uniquely consistent with a feedback process: activity in the inferior frontal gyrus was modulated by prior knowledge before activity in lower-level sensory regions of the superior temporal gyrus. In parallel, we parametrically varied the level of speech degradation, and therefore the amount of sensory detail, so that changes in neural responses attributable to sensory information and prior knowledge could be directly compared. Although sensory detail and prior knowledge both enhanced speech clarity, they had an opposite influence on the evoked response in the superior temporal gyrus. We argue that these data are best explained within the framework of predictive coding in which sensory activity is compared with top-down predictions and only unexplained activity propagated through the cortical hierarchy.
Hogrefe, Katharina; Rein, Robert; Skomroch, Harald; Lausberg, Hedda
2016-12-01
Persons with brain damage show deviant patterns of co-speech hand movement behaviour in comparison to healthy speakers. It has been claimed by several authors that gesture and speech rely on a single production mechanism that depends on the same neurological substrate while others claim that both modalities are closely related but separate production channels. Thus, findings so far are contradictory and there is a lack of studies that systematically analyse the full range of hand movements that accompany speech in the condition of brain damage. In the present study, we aimed to fill this gap by comparing hand movement behaviour in persons with unilateral brain damage to the left and the right hemisphere and a matched control group of healthy persons. For hand movement coding, we applied Module I of NEUROGES, an objective and reliable analysis system that enables to analyse the full repertoire of hand movements independent of speech, which makes it specifically suited for the examination of persons with aphasia. The main results of our study show a decreased use of communicative conceptual gestures in persons with damage to the right hemisphere and an increased use of these gestures in persons with left brain damage and aphasia. These results not only suggest that the production of gesture and speech do not rely on the same neurological substrate but also underline the important role of right hemisphere functioning for gesture production. Copyright © 2016 Elsevier Ltd. All rights reserved.
Shin, Young Hoon; Seo, Jiwon
2016-01-01
People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing. PMID:27801867
Shin, Young Hoon; Seo, Jiwon
2016-10-29
People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.
Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces
Bocquelet, Florent; Hueber, Thomas; Girin, Laurent; Savariaux, Christophe; Yvert, Blaise
2016-01-01
Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer. PMID:27880768
Significance of parametric spectral ratio methods in detection and recognition of whispered speech
NASA Astrophysics Data System (ADS)
Mathur, Arpit; Reddy, Shankar M.; Hegde, Rajesh M.
2012-12-01
In this article the significance of a new parametric spectral ratio method that can be used to detect whispered speech segments within normally phonated speech is described. Adaptation methods based on the maximum likelihood linear regression (MLLR) are then used to realize a mismatched train-test style speech recognition system. This proposed parametric spectral ratio method computes a ratio spectrum of the linear prediction (LP) and the minimum variance distortion-less response (MVDR) methods. The smoothed ratio spectrum is then used to detect whispered segments of speech within neutral speech segments effectively. The proposed LP-MVDR ratio method exhibits robustness at different SNRs as indicated by the whisper diarization experiments conducted on the CHAINS and the cell phone whispered speech corpus. The proposed method also performs reasonably better than the conventional methods for whisper detection. In order to integrate the proposed whisper detection method into a conventional speech recognition engine with minimal changes, adaptation methods based on the MLLR are used herein. The hidden Markov models corresponding to neutral mode speech are adapted to the whispered mode speech data in the whispered regions as detected by the proposed ratio method. The performance of this method is first evaluated on whispered speech data from the CHAINS corpus. The second set of experiments are conducted on the cell phone corpus of whispered speech. This corpus is collected using a set up that is used commercially for handling public transactions. The proposed whisper speech recognition system exhibits reasonably better performance when compared to several conventional methods. The results shown indicate the possibility of a whispered speech recognition system for cell phone based transactions.
Delphi, Maryam; Lotfi, M-Yones; Moossavi, Abdollah; Bakhshi, Enayatollah; Banimostafa, Maryam
2017-09-01
Previous studies have shown that interaural-time-difference (ITD) training can improve localization ability. Surprisingly little is, however, known about localization training vis-à-vis speech perception in noise based on interaural time difference in the envelope (ITD ENV). We sought to investigate the reliability of an ITD ENV-based training program in speech-in-noise perception among elderly individuals with normal hearing and speech-in-noise disorder. The present interventional study was performed during 2016. Sixteen elderly men between 55 and 65 years of age with the clinical diagnosis of normal hearing up to 2000 Hz and speech-in-noise perception disorder participated in this study. The training localization program was based on changes in ITD ENV. In order to evaluate the reliability of the training program, we performed speech-in-noise tests before the training program, immediately afterward, and then at 2 months' follow-up. The reliability of the training program was analyzed using the Friedman test and the SPSS software. Significant statistical differences were shown in the mean scores of speech-in-noise perception between the 3 time points (P=0.001). The results also indicated no difference in the mean scores of speech-in-noise perception between the 2 time points of immediately after the training program and 2 months' follow-up (P=0.212). The present study showed the reliability of an ITD ENV-based localization training in elderly individuals with speech-in-noise perception disorder.
Galbraith, G C; Jhaveri, S P; Kuo, J
1997-01-01
Speech-evoked brainstem frequency-following responses (FFRs) were recorded to repeated presentations of the same stimulus word. Word repetition results in illusory verbal transformations (VTs) in which word perceptions can differ markedly from the actual stimulus. Previous behavioral studies support an explanation of VTs based on changes in arousal or attention. Horizontal and vertical dipole FFRs were recorded to assess responses with putative origins in the auditory nerve and central brainstem, respectively. FFRs were recorded from 18 subjects when they correctly heard the stimulus and when they reported VTs. Although horizontal and vertical dipole FFRs showed different frequency response patterns, dipoles did not differentiate between perceptual conditions. However, when subjects were divided into low- and high-VT groups (based on percentage of VT trials), a significant Condition x Group interaction resulted. This interaction showed the largest difference in FFR amplitudes during VT trials, with the low-VT group showing increased amplitudes, and the high-VT group showing decreased amplitudes, relative to trials in which the stimulus was correctly perceived. These results demonstrate measurable subject differences in the early processing of complex signals, due to possible effects of attention on the brainstem FFR. The present research shows that the FFR is useful in understanding human language as it is coded and processed in the brainstem auditory pathway.
Emmorey, Karen; Petrich, Jennifer; Gollan, Tamar H.
2012-01-01
Bilinguals who are fluent in American Sign Language (ASL) and English often produce code-blends - simultaneously articulating a sign and a word while conversing with other ASL-English bilinguals. To investigate the cognitive mechanisms underlying code-blend processing, we compared picture-naming times (Experiment 1) and semantic categorization times (Experiment 2) for code-blends versus ASL signs and English words produced alone. In production, code-blending did not slow lexical retrieval for ASL and actually facilitated access to low-frequency signs. However, code-blending delayed speech production because bimodal bilinguals synchronized English and ASL lexical onsets. In comprehension, code-blending speeded access to both languages. Bimodal bilinguals’ ability to produce code-blends without any cost to ASL implies that the language system either has (or can develop) a mechanism for switching off competition to allow simultaneous production of close competitors. Code-blend facilitation effects during comprehension likely reflect cross-linguistic (and cross-modal) integration at the phonological and/or semantic levels. The absence of any consistent processing costs for code-blending illustrates a surprising limitation on dual-task costs and may explain why bimodal bilinguals code-blend more often than they code-switch. PMID:22773886
Speech recognition systems on the Cell Broadband Engine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Y; Jones, H; Vaidya, S
In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousandsmore » of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.« less
ERIC Educational Resources Information Center
Whitmire, Kathleen A.; Rivers, Kenyatta O.; Mele-McCarthy, Joan A.; Staskowski, Maureen
2014-01-01
Speech-language pathologists are faced with demands for evidence to support practice. Federal legislation requires high-quality evidence for decisions regarding school-based services as part of evidence-based practice. The purpose of this article is to discuss the limited scientific evidence for making appropriate decisions about speech-language…
Federal Register 2010, 2011, 2012, 2013, 2014
2013-10-23
...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities... for telecommunications relay services (TRS) by eliminating standards for Internet-based relay services... comments, identified by CG Docket No. 03-123, by any of the following methods: Electronic Filers: Comments...
Contemporary Reflections on Speech-Based Language Learning
ERIC Educational Resources Information Center
Gustafson, Marianne
2009-01-01
In "The Relation of Language to Mental Development and of Speech to Language Teaching," S.G. Davidson displayed several timeless insights into the role of speech in developing language and reasons for using speech as the basis for instruction for children who are deaf and hard of hearing. His understanding that speech includes more than merely…
Developing a Weighted Measure of Speech Sound Accuracy
Preston, Jonathan L.; Ramsdell, Heather L.; Oller, D. Kimbrough; Edwards, Mary Louise; Tobin, Stephen J.
2010-01-01
Purpose The purpose is to develop a system for numerically quantifying a speaker’s phonetic accuracy through transcription-based measures. With a focus on normal and disordered speech in children, we describe a system for differentially weighting speech sound errors based on various levels of phonetic accuracy with a Weighted Speech Sound Accuracy (WSSA) score. We then evaluate the reliability and validity of this measure. Method Phonetic transcriptions are analyzed from several samples of child speech, including preschoolers and young adolescents with and without speech sound disorders and typically developing toddlers. The new measure of phonetic accuracy is compared to existing measures, is used to discriminate typical and disordered speech production, and is evaluated to determine whether it is sensitive to changes in phonetic accuracy over time. Results Initial psychometric data indicate that WSSA scores correlate with other measures of phonetic accuracy as well as listeners’ judgments of severity of a child’s speech disorder. The measure separates children with and without speech sound disorders. WSSA scores also capture growth in phonetic accuracy in toddler’s speech over time. Conclusion Results provide preliminary support for the WSSA as a valid and reliable measure of phonetic accuracy in children’s speech. PMID:20699344
Deep neural network and noise classification-based speech enhancement
NASA Astrophysics Data System (ADS)
Shi, Wenhua; Zhang, Xiongwei; Zou, Xia; Han, Wei
2017-07-01
In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.
Bentsen, Thomas; May, Tobias; Kressner, Abigail A; Dau, Torsten
2018-01-01
Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.
Kaipa, Ramesh; Jones, Richard D; Robb, Michael P
2016-07-01
The benefits of different practice conditions in limb-based rehabilitation of motor disorders are well documented. Conversely, the role of practice structure in the treatment of motor-based speech disorders has only been minimally investigated. Considering this limitation, the current study aimed to investigate the effectiveness of selected practice conditions in spatial and temporal learning of novel speech utterances in individuals with Parkinson's disease (PD). Participants included 16 individuals with PD who were randomly and equally assigned to constant, variable, random, and blocked practice conditions. Participants in all four groups practiced a speech phrase for two consecutive days, and reproduced the speech phrase on the third day without further practice or feedback. There were no significant differences (p > 0.05) between participants across the four practice conditions with respect to either spatial or temporal learning of the speech phrase. Overall, PD participants demonstrated diminished spatial and temporal learning in comparison to healthy controls. Tests of strength of association between participants' demographic/clinical characteristics and speech-motor learning outcomes did not reveal any significant correlations. The findings from the current study suggest that repeated practice facilitates speech-motor learning in individuals with PD irrespective of the type of practice. Clinicians need to be cautious in applying practice conditions to treat speech deficits associated with PD based on the findings of non-speech-motor learning tasks. Copyright © 2016 Elsevier Ltd. All rights reserved.
School-Based Speech-Language Pathologists' Use of iPads
ERIC Educational Resources Information Center
Romane, Garvin Philippe
2017-01-01
This study explored school-based speech-language pathologists' (SLPs') use of iPads and apps for speech and language instruction, specifically for articulation, language, and vocabulary goals. A mostly quantitative-based survey was administered to approximately 2,800 SLPs in a K-12 setting; the final sample consisted of 189 licensed SLPs. Overall,…
Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers
Mustafa, Mumtaz Begum; Salim, Siti Salwah; Mohamed, Noraini; Al-Qatab, Bassam; Siong, Chng Eng
2014-01-01
Automatic speech recognition (ASR) is currently used in many assistive technologies, such as helping individuals with speech impairment in their communication ability. One challenge in ASR for speech-impaired individuals is the difficulty in obtaining a good speech database of impaired speakers for building an effective speech acoustic model. Because there are very few existing databases of impaired speech, which are also limited in size, the obvious solution to build a speech acoustic model of impaired speech is by employing adaptation techniques. However, issues that have not been addressed in existing studies in the area of adaptation for speech impairment are as follows: (1) identifying the most effective adaptation technique for impaired speech; and (2) the use of suitable source models to build an effective impaired-speech acoustic model. This research investigates the above-mentioned two issues on dysarthria, a type of speech impairment affecting millions of people. We applied both unimpaired and impaired speech as the source model with well-known adaptation techniques like the maximum likelihood linear regression (MLLR) and the constrained-MLLR(C-MLLR). The recognition accuracy of each impaired speech acoustic model is measured in terms of word error rate (WER), with further assessments, including phoneme insertion, substitution and deletion rates. Unimpaired speech when combined with limited high-quality speech-impaired data improves performance of ASR systems in recognising severely impaired dysarthric speech. The C-MLLR adaptation technique was also found to be better than MLLR in recognising mildly and moderately impaired speech based on the statistical analysis of the WER. It was found that phoneme substitution was the biggest contributing factor in WER in dysarthric speech for all levels of severity. The results show that the speech acoustic models derived from suitable adaptation techniques improve the performance of ASR systems in recognising impaired speech with limited adaptation data. PMID:24466004
Speech parts as Poisson processes.
Badalamenti, A F
2001-09-01
This paper presents evidence that six of the seven parts of speech occur in written text as Poisson processes, simple or recurring. The six major parts are nouns, verbs, adjectives, adverbs, prepositions, and conjunctions, with the interjection occurring too infrequently to support a model. The data consist of more than the first 5000 words of works by four major authors coded to label the parts of speech, as well as periods (sentence terminators). Sentence length is measured via the period and found to be normally distributed with no stochastic model identified for its occurrence. The models for all six speech parts but the noun significantly distinguish some pairs of authors and likewise for the joint use of all words types. Any one author is significantly distinguished from any other by at least one word type and sentence length very significantly distinguishes each from all others. The variety of word type use, measured by Shannon entropy, builds to about 90% of its maximum possible value. The rate constants for nouns are close to the fractions of maximum entropy achieved. This finding together with the stochastic models and the relations among them suggest that the noun may be a primitive organizer of written text.
Role of maternal gesture use in speech use by children with fragile X syndrome.
Hahn, Laura J; Zimmer, B Jean; Brady, Nancy C; Swinburne Romine, Rebecca E; Fleming, Kandace K
2014-05-01
The purpose of this study was to investigate how maternal gesture relates to speech production by children with fragile X syndrome (FXS). Participants were 27 young children with FXS (23 boys, 4 girls) and their mothers. Videotaped home observations were conducted between the ages of 25 and 37 months (toddler period) and again between the ages of 60 and 71 months (child period). The videos were later coded for types of maternal utterances and maternal gestures that preceded child speech productions. Children were also assessed with the Mullen Scales of Early Learning at both ages. Maternal gesture use in the toddler period was positively related to expressive language scores at both age periods and was related to receptive language scores in the child period. Maternal proximal pointing, in comparison to other gestures, evoked more speech responses from children during the mother-child interactions, particularly when combined with wh-questions. This study adds to the growing body of research on the importance of contextual variables, such as maternal gestures, in child language development. Parental gesture use may be an easily added ingredient to parent-focused early language intervention programs.
Tavano, Alessandro; Pesarin, Anna; Murino, Vittorio; Cristani, Marco
2014-01-01
Individuals with Asperger syndrome/High Functioning Autism fail to spontaneously attribute mental states to the self and others, a life-long phenotypic characteristic known as mindblindness. We hypothesized that mindblindness would affect the dynamics of conversational interaction. Using generative models, in particular Gaussian mixture models and observed influence models, conversations were coded as interacting Markov processes, operating on novel speech/silence patterns, termed Steady Conversational Periods (SCPs). SCPs assume that whenever an agent's process changes state (e.g., from silence to speech), it causes a general transition of the entire conversational process, forcing inter-actant synchronization. SCPs fed into observed influence models, which captured the conversational dynamics of children and adolescents with Asperger syndrome/High Functioning Autism, and age-matched typically developing participants. Analyzing the parameters of the models by means of discriminative classifiers, the dialogs of patients were successfully distinguished from those of control participants. We conclude that meaning-free speech/silence sequences, reflecting inter-actant synchronization, at least partially encode typical and atypical conversational dynamics. This suggests a direct influence of theory of mind abilities onto basic speech initiative behavior. PMID:24489674
Argument Structure, Speech Acts, and Roles in Child-Adult Dispute Episodes.
ERIC Educational Resources Information Center
Prescott, Barbara L.
A study identified discourse patterns in potential disputes, deflected disputes, incomplete, and completed disputes from a one-hour conversation involving two 3-year-old female children and one female adult. These varied dispute episodes were identified, coded, and analyzed using a pragmatic model of adult argumentation focusing on the structures,…
ERIC Educational Resources Information Center
Cox, David J.
2012-01-01
To address the developmental deficits of children with autism, several disciplines have come to the forefront within intervention programs. These are speech-pathologists, psychologists/counselors, occupational-therapists/physical-therapists, special-education consultants, behavior analysts, and physicians/medical personnel. As the field of autism…
Speech Perception Deficits in Poor Readers: Auditory Processing or Phonological Coding?
ERIC Educational Resources Information Center
Mody, Maria; And Others
1997-01-01
Forty second-graders, 20 good and 20 poor readers, completed a /ba/-/da/ temporal order judgment (TOJ) task. The groups did not differ in TOJ when /ba/ and /da/ were paired with more easily discriminated syllables. Poor readers' difficulties with /ba/-/da/ reflected perceptual confusion between phonetically similar syllables rather than difficulty…
Predicting Phonetic Transcription Agreement: Insights from Research in Infant Vocalizations
ERIC Educational Resources Information Center
Ramsdell, Heather L.; Oller, D. Kimbrough; Ethington, Corinna A.
2007-01-01
The purpose of this study is to provide new perspectives on correlates of phonetic transcription agreement. Our research focuses on phonetic transcription and coding of infant vocalizations. The findings are presumed to be broadly applicable to other difficult cases of transcription, such as found in severe disorders of speech, which similarly…
Searching for Syllabic Coding Units in Speech Perception
ERIC Educational Resources Information Center
Dumay, Nicolas; Content, Alain
2012-01-01
Two auditory priming experiments tested whether the effect of final phonological overlap relies on syllabic representations. Amount of shared phonemic information and syllabic status of the overlap between nonword primes and targets were varied orthogonally. In the related conditions, CV.CCVC items shared the last syllable (e.g., vi.klyd-p[image…
The Effects of Prohibiting Gestures on Children's Lexical Retrieval Ability
ERIC Educational Resources Information Center
Pine, Karen J.; Bird, Hannah; Kirk, Elizabeth
2007-01-01
Two alternative accounts have been proposed to explain the role of gestures in thinking and speaking. The Information Packaging Hypothesis (Kita, 2000) claims that gestures are important for the conceptual packaging of information before it is coded into a linguistic form for speech. The Lexical Retrieval Hypothesis (Rauscher, Krauss & Chen, 1996)…
Audio-visual speech cue combination.
Arnold, Derek H; Tear, Morgan; Schindel, Ryan; Roseboom, Warrick
2010-04-16
Different sources of sensory information can interact, often shaping what we think we have seen or heard. This can enhance the precision of perceptual decisions relative to those made on the basis of a single source of information. From a computational perspective, there are multiple reasons why this might happen, and each predicts a different degree of enhanced precision. Relatively slight improvements can arise when perceptual decisions are made on the basis of multiple independent sensory estimates, as opposed to just one. These improvements can arise as a consequence of probability summation. Greater improvements can occur if two initially independent estimates are summated to form a single integrated code, especially if the summation is weighted in accordance with the variance associated with each independent estimate. This form of combination is often described as a Bayesian maximum likelihood estimate. Still greater improvements are possible if the two sources of information are encoded via a common physiological process. Here we show that the provision of simultaneous audio and visual speech cues can result in substantial sensitivity improvements, relative to single sensory modality based decisions. The magnitude of the improvements is greater than can be predicted on the basis of either a Bayesian maximum likelihood estimate or a probability summation. Our data suggest that primary estimates of speech content are determined by a physiological process that takes input from both visual and auditory processing, resulting in greater sensitivity than would be possible if initially independent audio and visual estimates were formed and then subsequently combined.
Fifty years of progress in speech and speaker recognition
NASA Astrophysics Data System (ADS)
Furui, Sadaoki
2004-10-01
Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus-base statistical modeling, e.g., HMM and n-grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time-normalization to DTW/DP matching, (4) from gdistanceh-based to likelihood-based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context-independent units to context-dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker-independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single-modality (audio signal only) to multi-modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.
ERIC Educational Resources Information Center
Young, Victoria; Mihailidis, Alex
2010-01-01
Despite their growing presence in home computer applications and various telephony services, commercial automatic speech recognition technologies are still not easily employed by everyone; especially individuals with speech disorders. In addition, relatively little research has been conducted on automatic speech recognition performance with older…
ERIC Educational Resources Information Center
Tierney, Joseph; Mack, Molly
1987-01-01
Stimuli used in research on the perception of the speech signal have often been obtained from simple filtering and distortion of the speech waveform, sometimes accompanied by noise. However, for more complex stimulus generation, the parameters of speech can be manipulated, after analysis and before synthesis, using various types of algorithms to…
Are written and spoken recall of text equivalent?
Kellogg, Ronald T
2007-01-01
Writing is less practiced than speaking, graphemic codes are activated only in writing, and the retrieved representations of the text must be maintained in working memory longer because handwritten output is slower than speech. These extra demands on working memory could result in less effort being given to retrieval during written compared with spoken text recall. To test this hypothesis, college students read or heard Bartlett's "War of the Ghosts" and then recalled the text in writing or speech. Spoken recall produced more accurately recalled propositions and more major distortions (e.g., inferences) than written recall. The results suggest that writing reduces the retrieval effort given to reconstructing the propositions of a text.
An acoustic feature-based similarity scoring system for speech rehabilitation assistance.
Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny
2016-08-01
The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the same time, also provided further deep analysis of the speech, which can be useful for the speech therapist.
CACTI: free, open-source software for the sequential coding of behavioral interactions.
Glynn, Lisa H; Hallgren, Kevin A; Houck, Jon M; Moyers, Theresa B
2012-01-01
The sequential analysis of client and clinician speech in psychotherapy sessions can help to identify and characterize potential mechanisms of treatment and behavior change. Previous studies required coding systems that were time-consuming, expensive, and error-prone. Existing software can be expensive and inflexible, and furthermore, no single package allows for pre-parsing, sequential coding, and assignment of global ratings. We developed a free, open-source, and adaptable program to meet these needs: The CASAA Application for Coding Treatment Interactions (CACTI). Without transcripts, CACTI facilitates the real-time sequential coding of behavioral interactions using WAV-format audio files. Most elements of the interface are user-modifiable through a simple XML file, and can be further adapted using Java through the terms of the GNU Public License. Coding with this software yields interrater reliabilities comparable to previous methods, but at greatly reduced time and expense. CACTI is a flexible research tool that can simplify psychotherapy process research, and has the potential to contribute to the improvement of treatment content and delivery.
Experience with code-switching modulates the use of grammatical gender during sentence processing
Valdés Kroff, Jorge R.; Dussias, Paola E.; Gerfen, Chip; Perrotti, Lauren; Bajo, M. Teresa
2016-01-01
Using code-switching as a tool to illustrate how language experience modulates comprehension, the visual world paradigm was employed to examine the extent to which gender-marked Spanish determiners facilitate upcoming target nouns in a group of Spanish-English bilingual code-switchers. The first experiment tested target Spanish nouns embedded in a carrier phrase (Experiment 1b) and included a control Spanish monolingual group (Experiment 1a). The second set of experiments included critical trials in which participants heard code-switches from Spanish determiners into English nouns (e.g., la house) either in a fixed carrier phrase (Experiment 2a) or in variable and complex sentences (Experiment 2b). Across the experiments, bilinguals revealed an asymmetric gender effect in processing, showing facilitation only for feminine target items. These results reflect the asymmetric use of gender in the production of code-switched speech. The extension of the asymmetric effect into Spanish (Experiment 1b) underscores the permeability between language modes in bilingual code-switchers. PMID:28663771
Automatic Speech Recognition from Neural Signals: A Focused Review.
Herff, Christian; Schultz, Tanja
2016-01-01
Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.
Address entry while driving: speech recognition versus a touch-screen keyboard.
Tsimhoni, Omer; Smith, Daniel; Green, Paul
2004-01-01
A driving simulator experiment was conducted to determine the effects of entering addresses into a navigation system during driving. Participants drove on roads of varying visual demand while entering addresses. Three address entry methods were explored: word-based speech recognition, character-based speech recognition, and typing on a touch-screen keyboard. For each method, vehicle control and task measures, glance timing, and subjective ratings were examined. During driving, word-based speech recognition yielded the shortest total task time (15.3 s), followed by character-based speech recognition (41.0 s) and touch-screen keyboard (86.0 s). The standard deviation of lateral position when performing keyboard entry (0.21 m) was 60% higher than that for all other address entry methods (0.13 m). Degradation of vehicle control associated with address entry using a touch screen suggests that the use of speech recognition is favorable. Speech recognition systems with visual feedback, however, even with excellent accuracy, are not without performance consequences. Applications of this research include the design of in-vehicle navigation systems as well as other systems requiring significant driver input, such as E-mail, the Internet, and text messaging.
Monkey vocal tracts are speech-ready.
Fitch, W Tecumseh; de Boer, Bart; Mathur, Neil; Ghazanfar, Asif A
2016-12-01
For four decades, the inability of nonhuman primates to produce human speech sounds has been claimed to stem from limitations in their vocal tract anatomy, a conclusion based on plaster casts made from the vocal tract of a monkey cadaver. We used x-ray videos to quantify vocal tract dynamics in living macaques during vocalization, facial displays, and feeding. We demonstrate that the macaque vocal tract could easily produce an adequate range of speech sounds to support spoken language, showing that previous techniques based on postmortem samples drastically underestimated primate vocal capabilities. Our findings imply that the evolution of human speech capabilities required neural changes rather than modifications of vocal anatomy. Macaques have a speech-ready vocal tract but lack a speech-ready brain to control it.
Nakai, Yasushi; Takiguchi, Tetsuya; Matsui, Gakuyo; Yamaoka, Noriko; Takada, Satoshi
2017-10-01
Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.
A Case Study of a Collaborative Speech-Language Pathologist
ERIC Educational Resources Information Center
Ritzman, Mitzi J.; Sanger, Dixie; Coufal, Kathy L.
2006-01-01
This study explored how a school-based speech-language pathologist implemented a classroom-based service delivery model that focused on collaborative practices in classroom settings. The study used ethnographic observations and interviews with 1 speech-language pathologist to provide insights into how she implemented collaborative consultation and…
Choosing and Using Text-to-Speech Software
ERIC Educational Resources Information Center
Peters, Tom; Bell, Lori
2007-01-01
This article describes a computer-based technology for generating speech called text-to-speech (TTS). This software is ready for widespread use by libraries, other organizations, and individual users. It offers the affordable ability to turn just about any electronic text that is not image-based into an artificially spoken communication. The…
ERIC Educational Resources Information Center
Hill, Anne J.; Theodoros, Deborah G.; Russell, Trevor G.; Cahill, Louise M.; Ward, Elizabeth C.; Clark, Kathy M.
2006-01-01
Purpose: This pilot study explored the feasibility and effectiveness of an Internet-based telerehabilitation application for the assessment of motor speech disorders in adults with acquired neurological impairment. Method: Using a counterbalanced, repeated measures research design, 2 speech-language pathologists assessed 19 speakers with…
Elements of a Plan-Based Theory of Speech Acts. Technical Report No. 141.
ERIC Educational Resources Information Center
Cohen, Philip R.; Perrault, C. Raymond
This report proposes that people often plan their speech acts to affect their listeners' beliefs, goals, and emotional states and that such language use can be modeled by viewing speech acts as operators in a planning system, allowing both physical and speech acts to be integrated into plans. Methodological issues of how speech acts should be…
Tilsen, Sam; Arvaniti, Amalia
2013-07-01
This study presents a method for analyzing speech rhythm using empirical mode decomposition of the speech amplitude envelope, which allows for extraction and quantification of syllabic- and supra-syllabic time-scale components of the envelope. The method of empirical mode decomposition of a vocalic energy amplitude envelope is illustrated in detail, and several types of rhythm metrics derived from this method are presented. Spontaneous speech extracted from the Buckeye Corpus is used to assess the effect of utterance length on metrics, and it is shown how metrics representing variability in the supra-syllabic time-scale components of the envelope can be used to identify stretches of speech with targeted rhythmic characteristics. Furthermore, the envelope-based metrics are used to characterize cross-linguistic differences in speech rhythm in the UC San Diego Speech Lab corpus of English, German, Greek, Italian, Korean, and Spanish speech elicited in read sentences, read passages, and spontaneous speech. The envelope-based metrics exhibit significant effects of language and elicitation method that argue for a nuanced view of cross-linguistic rhythm patterns.
[Restoration of speech function in oncological patients with maxillary defects].
Matiakin, E G; Chuchkov, V M; Akhundov, A A; Azizian, R I; Romanov, I S; Chuchkov, M V; Agapov, V V
2009-01-01
Speech quality was evaluated in 188 patients with acquired maxillary defects. Prosthetic treatment of 29 patients was preceded by pharmacopsychotherapy. Sixty three patients had lessons with a logopedist and 66 practiced self-tuition based on the specially developed test. Thirty patients were examined for the quality of speech without preliminary preparation. Speech quality was assessed by auditory and spectral analysis. The main forms of impaired speech quality in the patients with maxillary defects were marked rhinophonia and impaired articulation. The proposed analytical tests were based on a combination of "difficult" vowels and consonants. The use of a removable prostheses with an obturator failed to correct the affected speech function but created prerequisites for the formation of the correct speech stereotype. Results of the study suggest the relationship between the quality of speech in subjects with maxillary defects and their intellectual faculties as well as the desire to overcome this drawback. The proposed tests are designed to activate the neuromuscular apparatus responsible for the generation of the speech. Lessons with a speech therapist give a powerful emotional incentive to the patients and promote their efforts toward restoration of speaking ability. Pharmacopsychotherapy and self-control are another efficacious tools for the improvement of speech quality in patients with maxillary defects.
Childhood apraxia of speech: A survey of praxis and typical speech characteristics.
Malmenholt, Ann; Lohmander, Anette; McAllister, Anita
2017-07-01
The purpose of this study was to investigate current knowledge of the diagnosis childhood apraxia of speech (CAS) in Sweden and compare speech characteristics and symptoms to those of earlier survey findings in mainly English-speakers. In a web-based questionnaire 178 Swedish speech-language pathologists (SLPs) anonymously answered questions about their perception of typical speech characteristics for CAS. They graded own assessment skills and estimated clinical occurrence. The seven top speech characteristics reported as typical for children with CAS were: inconsistent speech production (85%), sequencing difficulties (71%), oro-motor deficits (63%), vowel errors (62%), voicing errors (61%), consonant cluster deletions (54%), and prosodic disturbance (53%). Motor-programming deficits described as lack of automatization of speech movements were perceived by 82%. All listed characteristics were consistent with the American Speech-Language-Hearing Association (ASHA) consensus-based features, Strand's 10-point checklist, and the diagnostic model proposed by Ozanne. The mode for clinical occurrence was 5%. Number of suspected cases of CAS in the clinical caseload was approximately one new patient/year and SLP. The results support and add to findings from studies of CAS in English-speaking children with similar speech characteristics regarded as typical. Possibly, these findings could contribute to cross-linguistic consensus on CAS characteristics.
Jørgensen, Søren; Dau, Torsten
2011-09-01
A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America
Speech-Language Dissociations, Distractibility, and Childhood Stuttering
Conture, Edward G.; Walden, Tedra A.; Lambert, Warren E.
2015-01-01
Purpose This study investigated the relation among speech-language dissociations, attentional distractibility, and childhood stuttering. Method Participants were 82 preschool-age children who stutter (CWS) and 120 who do not stutter (CWNS). Correlation-based statistics (Bates, Appelbaum, Salcedo, Saygin, & Pizzamiglio, 2003) identified dissociations across 5 norm-based speech-language subtests. The Behavioral Style Questionnaire Distractibility subscale measured attentional distractibility. Analyses addressed (a) between-groups differences in the number of children exhibiting speech-language dissociations; (b) between-groups distractibility differences; (c) the relation between distractibility and speech-language dissociations; and (d) whether interactions between distractibility and dissociations predicted the frequency of total, stuttered, and nonstuttered disfluencies. Results More preschool-age CWS exhibited speech-language dissociations compared with CWNS, and more boys exhibited dissociations compared with girls. In addition, male CWS were less distractible than female CWS and female CWNS. For CWS, but not CWNS, less distractibility (i.e., greater attention) was associated with more speech-language dissociations. Last, interactions between distractibility and dissociations did not predict speech disfluencies in CWS or CWNS. Conclusions The present findings suggest that for preschool-age CWS, attentional processes are associated with speech-language dissociations. Future investigations are warranted to better understand the directionality of effect of this association (e.g., inefficient attentional processes → speech-language dissociations vs. inefficient attentional processes ← speech-language dissociations). PMID:26126203
Ethics in the practice of speech-language pathology in health care settings.
Kummer, Ann W; Turner, Jan
2011-11-01
ETHICS refers to a moral philosophy or a set of moral principles that determine appropriate behavior in a society. Medical ethics includes a set of specific values that are considered in determining appropriate conduct in the practice of medicine or health care. Because the practice of medicine and medical speech-language pathology affects the health, well-being, and quality of life of individuals served, adherence to a code of ethical conduct is critically important in the health care environment. When ethical dilemmas arise, consultation with a bioethics committee can be helpful in determining the best course of action. This article will help to define medical ethics and to discuss the six basic values that are commonly considered in discussions of medical ethics. Common ethical mistakes in the practice of speech-language pathology will be described. Finally, the value of a bioethics consultation for help in resolving complex ethical issues will be discussed. © Thieme Medical Publishers.
Focal versus distributed temporal cortex activity for speech sound category assignment
Bouton, Sophie; Chambon, Valérian; Tyrand, Rémi; Seeck, Margitta; Karkar, Sami; van de Ville, Dimitri; Giraud, Anne-Lise
2018-01-01
Percepts and words can be decoded from distributed neural activity measures. However, the existence of widespread representations might conflict with the more classical notions of hierarchical processing and efficient coding, which are especially relevant in speech processing. Using fMRI and magnetoencephalography during syllable identification, we show that sensory and decisional activity colocalize to a restricted part of the posterior superior temporal gyrus (pSTG). Next, using intracortical recordings, we demonstrate that early and focal neural activity in this region distinguishes correct from incorrect decisions and can be machine-decoded to classify syllables. Crucially, significant machine decoding was possible from neuronal activity sampled across different regions of the temporal and frontal lobes, despite weak or absent sensory or decision-related responses. These findings show that speech-sound categorization relies on an efficient readout of focal pSTG neural activity, while more distributed activity patterns, although classifiable by machine learning, instead reflect collateral processes of sensory perception and decision. PMID:29363598
Tóth, László; Hoffmann, Ildikó; Gosztolya, Gábor; Vincze, Veronika; Szatlóczki, Gréta; Bánréti, Zoltán; Pákáski, Magdolna; Kálmán, János
2018-01-01
Background: Even today the reliable diagnosis of the prodromal stages of Alzheimer’s disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive de-cline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Methods: Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech sig-nals, first manually (using the Praat software), and then automatically, with an automatic speech recogni-tion (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. Results: The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process – that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. Conclusion: The temporal analysis of spontaneous speech can be exploited in implementing a new, auto-matic detection-based tool for screening MCI for the community. PMID:29165085
Toth, Laszlo; Hoffmann, Ildiko; Gosztolya, Gabor; Vincze, Veronika; Szatloczki, Greta; Banreti, Zoltan; Pakaski, Magdolna; Kalman, Janos
2018-01-01
Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech signals, first manually (using the Praat software), and then automatically, with an automatic speech recognition (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process - that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. The temporal analysis of spontaneous speech can be exploited in implementing a new, automatic detection-based tool for screening MCI for the community. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
NASA Astrophysics Data System (ADS)
Gao, Pei-pei; Liu, Feng
2016-10-01
With the development of information technology and artificial intelligence, speech synthesis plays a significant role in the fields of Human-Computer Interaction Techniques. However, the main problem of current speech synthesis techniques is lacking of naturalness and expressiveness so that it is not yet close to the standard of natural language. Another problem is that the human-computer interaction based on the speech synthesis is too monotonous to realize mechanism of user subjective drive. This thesis introduces the historical development of speech synthesis and summarizes the general process of this technique. It is pointed out that prosody generation module is an important part in the process of speech synthesis. On the basis of further research, using eye activity rules when reading to control and drive prosody generation was introduced as a new human-computer interaction method to enrich the synthetic form. In this article, the present situation of speech synthesis technology is reviewed in detail. Based on the premise of eye gaze data extraction, using eye movement signal in real-time driving, a speech synthesis method which can express the real speech rhythm of the speaker is proposed. That is, when reader is watching corpora with its eyes in silent reading, capture the reading information such as the eye gaze duration per prosodic unit, and establish a hierarchical prosodic pattern of duration model to determine the duration parameters of synthesized speech. At last, after the analysis, the feasibility of the above method is verified.
Communication acoustics in Bell Labs
NASA Astrophysics Data System (ADS)
Flanagan, J. L.
2004-05-01
Communication aoustics has been a central theme in Bell Labs research since its inception. Telecommunication serves human information exchange. And, humans favor spoken language as a principal mode. The atmospheric medium typically provides the link between articulation and hearing. Creation, control and detection of sound, and the human's facility for generation and perception are basic ingredients of telecommunication. Electronics technology of the 1920s ushered in great advances in communication at a distance, a strong economical impetus being to overcome bandwidth limitations of wireline and cable. Early research established criteria for speech transmission with high quality and intelligibility. These insights supported exploration of means for efficient transmission-obtaining the greatest amount of speech information over a given bandwidth. Transoceanic communication was initiated by undersea cables for telegraphy. But these long cables exhibited very limited bandwidth (order of few hundred Hz). The challenge of sending voice across the oceans spawned perhaps the best known speech compression technique of history-the Vocoder, which parametrized the signal for transmission in about 300 Hz bandwidth, one-tenth that required for the typical waveform channel. Quality and intelligibility were grave issues (and they still are). At the same time parametric representation offered possibilities for encryption and privacy inside a traditional voice bandwidth. Confidential conversations between Roosevelt and Churchill during World War II were carried over high-frequency radio by an encrypted vocoder system known as Sigsaly. Major engineering advances in the late 1940s and early 1950s moved telecommunications into a new regime-digital technology. These key advances were at least three: (i) new understanding of time-discrete (sampled) representation of signals, (ii) digital computation (especially binary based), and (iii) evolving capabilities in microelectronics that ultimately provided circuits of enormous complexity with low cost and power. Digital transmission (as exemplified in pulse code modulation-PCM, and its many derivatives) became a telecommunication mainstay, along with switches to control and route information in digital form. Concomitantly, storage means for digital information advanced, providing another impetus for speech compression. More and more, humans saw the need to exchange speech information with machines, as well as with other humans. Human-machine speech communication came to full stride in the early 1990s, and now has expanded to multimodal domains that begin to support enhanced naturalness, using contemporaneous sight, sound and touch signaling. Packet transmission is supplanting circuit switching, and voice and video are commonly being carried by Internet protocol.
NASA Astrophysics Data System (ADS)
Thoonsaengngam, Rattapol; Tangsangiumvisai, Nisachon
This paper proposes an enhanced method for estimating the a priori Signal-to-Disturbance Ratio (SDR) to be employed in the Acoustic Echo and Noise Suppression (AENS) system for full-duplex hands-free communications. The proposed a priori SDR estimation technique is modified based upon the Two-Step Noise Reduction (TSNR) algorithm to suppress the background noise while preserving speech spectral components. In addition, a practical approach to determine accurately the Echo Spectrum Variance (ESV) is presented based upon the linear relationship assumption between the power spectrum of far-end speech and acoustic echo signals. The ESV estimation technique is then employed to alleviate the acoustic echo problem. The performance of the AENS system that employs these two proposed estimation techniques is evaluated through the Echo Attenuation (EA), Noise Attenuation (NA), and two speech distortion measures. Simulation results based upon real speech signals guarantee that our improved AENS system is able to mitigate efficiently the problem of acoustic echo and background noise, while preserving the speech quality and speech intelligibility.
Winn, Matthew B; Won, Jong Ho; Moon, Il Joon
This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.
Winn, Matthew B.; Won, Jong Ho; Moon, Il Joon
2016-01-01
Objectives This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). We hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. We further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Design Nineteen CI listeners and 10 listeners with normal hearing (NH) participated in a suite of tasks that included spectral ripple discrimination (SRD), temporal modulation detection (TMD), and syllable categorization, which was split into a spectral-cue-based task (targeting the /ba/-/da/ contrast) and a timing-cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated in order to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression in order to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for CI listeners. Results CI users were generally less successful at utilizing both spectral and temporal cues for categorization compared to listeners with normal hearing. For the CI listener group, SRD was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. TMD using 100 Hz and 10 Hz modulated noise was not correlated with the CI subjects’ categorization of VOT, nor with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. Conclusions When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart non-linguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (VOT) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language. PMID:27438871
Expertise with artificial non-speech sounds recruits speech-sensitive cortical regions
Leech, Robert; Holt, Lori L.; Devlin, Joseph T.; Dick, Frederic
2009-01-01
Regions of the human temporal lobe show greater activation for speech than for other sounds. These differences may reflect intrinsically specialized domain-specific adaptations for processing speech, or they may be driven by the significant expertise we have in listening to the speech signal. To test the expertise hypothesis, we used a video-game-based paradigm that tacitly trained listeners to categorize acoustically complex, artificial non-linguistic sounds. Before and after training, we used functional MRI to measure how expertise with these sounds modulated temporal lobe activation. Participants’ ability to explicitly categorize the non-speech sounds predicted the change in pre- to post-training activation in speech-sensitive regions of the left posterior superior temporal sulcus, suggesting that emergent auditory expertise may help drive this functional regionalization. Thus, seemingly domain-specific patterns of neural activation in higher cortical regions may be driven in part by experience-based restructuring of high-dimensional perceptual space. PMID:19386919
Towards Artificial Speech Therapy: A Neural System for Impaired Speech Segmentation.
Iliya, Sunday; Neri, Ferrante
2016-09-01
This paper presents a neural system-based technique for segmenting short impaired speech utterances into silent, unvoiced, and voiced sections. Moreover, the proposed technique identifies those points of the (voiced) speech where the spectrum becomes steady. The resulting technique thus aims at detecting that limited section of the speech which contains the information about the potential impairment of the speech. This section is of interest to the speech therapist as it corresponds to the possibly incorrect movements of speech organs (lower lip and tongue with respect to the vocal tract). Two segmentation models to detect and identify the various sections of the disordered (impaired) speech signals have been developed and compared. The first makes use of a combination of four artificial neural networks. The second is based on a support vector machine (SVM). The SVM has been trained by means of an ad hoc nested algorithm whose outer layer is a metaheuristic while the inner layer is a convex optimization algorithm. Several metaheuristics have been tested and compared leading to the conclusion that some variants of the compact differential evolution (CDE) algorithm appears to be well-suited to address this problem. Numerical results show that the SVM model with a radial basis function is capable of effective detection of the portion of speech that is of interest to a therapist. The best performance has been achieved when the system is trained by the nested algorithm whose outer layer is hybrid-population-based/CDE. A population-based approach displays the best performance for the isolation of silence/noise sections, and the detection of unvoiced sections. On the other hand, a compact approach appears to be clearly well-suited to detect the beginning of the steady state of the voiced signal. Both the proposed segmentation models display outperformed two modern segmentation techniques based on Gaussian mixture model and deep learning.
Na, Sung Dae; Wei, Qun; Seong, Ki Woong; Cho, Jin Ho; Kim, Myoung Nam
2018-01-01
The conventional methods of speech enhancement, noise reduction, and voice activity detection are based on the suppression of noise or non-speech components of the target air-conduction signals. However, air-conduced speech is hard to differentiate from babble or white noise signals. To overcome this problem, the proposed algorithm uses the bone-conduction speech signals and soft thresholding based on the Shannon entropy principle and cross-correlation of air- and bone-conduction signals. A new algorithm for speech detection and noise reduction is proposed, which makes use of the Shannon entropy principle and cross-correlation with the bone-conduction speech signals to threshold the wavelet packet coefficients of the noisy speech. The proposed method can be get efficient result by objective quality measure that are PESQ, RMSE, Correlation, SNR. Each threshold is generated by the entropy and cross-correlation approaches in the decomposed bands using the wavelet packet decomposition. As a result, the noise is reduced by the proposed method using the MATLAB simulation. To verify the method feasibility, we compared the air- and bone-conduction speech signals and their spectra by the proposed method. As a result, high performance of the proposed method is confirmed, which makes it quite instrumental to future applications in communication devices, noisy environment, construction, and military operations.
Gilbert, Kathryn E
2013-02-01
Recent attempts to regulate Crisis Pregnancy Centers, pseudoclinics that surreptitiously aim to dissuade pregnant women from choosing abortion, have confronted the thorny problem of how to define commercial speech. The Supreme Court has offered three potential answers to this definitional quandary. This Note uses the Crisis Pregnancy Center cases to demonstrate that courts should use one of these solutions, the factor-based approach of Bolger v. Youngs Drugs Products Corp., to define commercial speech in the Crisis Pregnancy Center cases and elsewhere. In principle and in application, the Bolger factor-based approach succeeds in structuring commercial speech analysis at the margins of the doctrine.
Speech graphs provide a quantitative measure of thought disorder in psychosis.
Mota, Natalia B; Vasconcelos, Nivaldo A P; Lemos, Nathalia; Pieretti, Ana C; Kinouchi, Osame; Cecchi, Guillermo A; Copelli, Mauro; Ribeiro, Sidarta
2012-01-01
Psychosis has various causes, including mania and schizophrenia. Since the differential diagnosis of psychosis is exclusively based on subjective assessments of oral interviews with patients, an objective quantification of the speech disturbances that characterize mania and schizophrenia is in order. In principle, such quantification could be achieved by the analysis of speech graphs. A graph represents a network with nodes connected by edges; in speech graphs, nodes correspond to words and edges correspond to semantic and grammatical relationships. To quantify speech differences related to psychosis, interviews with schizophrenics, manics and normal subjects were recorded and represented as graphs. Manics scored significantly higher than schizophrenics in ten graph measures. Psychopathological symptoms such as logorrhea, poor speech, and flight of thoughts were grasped by the analysis even when verbosity differences were discounted. Binary classifiers based on speech graph measures sorted schizophrenics from manics with up to 93.8% of sensitivity and 93.7% of specificity. In contrast, sorting based on the scores of two standard psychiatric scales (BPRS and PANSS) reached only 62.5% of sensitivity and specificity. The results demonstrate that alterations of the thought process manifested in the speech of psychotic patients can be objectively measured using graph-theoretical tools, developed to capture specific features of the normal and dysfunctional flow of thought, such as divergence and recurrence. The quantitative analysis of speech graphs is not redundant with standard psychometric scales but rather complementary, as it yields a very accurate sorting of schizophrenics and manics. Overall, the results point to automated psychiatric diagnosis based not on what is said, but on how it is said.
Micro-Based Speech Recognition: Instructional Innovation for Handicapped Learners.
ERIC Educational Resources Information Center
Horn, Carin E.; Scott, Brian L.
A new voice based learning system (VBLS), which allows the handicapped user to interact with a microcomputer by voice commands, is described. Speech or voice recognition is the computerized process of identifying a spoken word or phrase, including those resulting from speech impediments. This new technology is helpful to the severely physically…
ERIC Educational Resources Information Center
Nozari, Nazbanou; Dell, Gary S.; Schwartz, Myrna F.
2011-01-01
Despite the existence of speech errors, verbal communication is successful because speakers can detect (and correct) their errors. The standard theory of speech-error detection, the perceptual-loop account, posits that the comprehension system monitors production output for errors. Such a comprehension-based monitor, however, cannot explain the…
Using Web Speech Technology with Language Learning Applications
ERIC Educational Resources Information Center
Daniels, Paul
2015-01-01
In this article, the author presents the history of human-to-computer interaction based upon the design of sophisticated computerized speech recognition algorithms. Advancements such as the arrival of cloud-based computing and software like Google's Web Speech API allows anyone with an Internet connection and Chrome browser to take advantage of…
Evidence-Based Practice for Children with Speech Sound Disorders: Part 1 Narrative Review
ERIC Educational Resources Information Center
Baker, Elise; McLeod, Sharynne
2011-01-01
Purpose: This article provides a comprehensive narrative review of intervention studies for children with speech sound disorders (SSD). Its companion paper (Baker & McLeod, 2011) provides a tutorial and clinical example of how speech-language pathologists (SLPs) can engage in evidence-based practice (EBP) for this clinical population. Method:…
Visemic Processing in Audiovisual Discrimination of Natural Speech: A Simultaneous fMRI-EEG Study
ERIC Educational Resources Information Center
Dubois, Cyril; Otzenberger, Helene; Gounot, Daniel; Sock, Rudolph; Metz-Lutz, Marie-Noelle
2012-01-01
In a noisy environment, visual perception of articulatory movements improves natural speech intelligibility. Parallel to phonemic processing based on auditory signal, visemic processing constitutes a counterpart based on "visemes", the distinctive visual units of speech. Aiming at investigating the neural substrates of visemic processing in a…
New Ideas for Speech Recognition and Related Technologies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holzrichter, J F
The ideas relating to the use of organ motion sensors for the purposes of speech recognition were first described by.the author in spring 1994. During the past year, a series of productive collaborations between the author, Tom McEwan and Larry Ng ensued and have lead to demonstrations, new sensor ideas, and algorithmic descriptions of a large number of speech recognition concepts. This document summarizes the basic concepts of recognizing speech once organ motions have been obtained. Micro power radars and their uses for the measurement of body organ motions, such as those of the heart and lungs, have been demonstratedmore » by Tom McEwan over the past two years. McEwan and I conducted a series of experiments, using these instruments, on vocal organ motions beginning in late spring, during which we observed motions of vocal folds (i.e., cords), tongue, jaw, and related organs that are very useful for speech recognition and other purposes. These will be reviewed in a separate paper. Since late summer 1994, Lawrence Ng and I have worked to make many of the initial recognition ideas more rigorous and to investigate the applications of these new ideas to new speech recognition algorithms, to speech coding, and to speech synthesis. I introduce some of those ideas in section IV of this document, and we describe them more completely in the document following this one, UCRL-UR-120311. For the design and operation of micro-power radars and their application to body organ motions, the reader may contact Tom McEwan directly. The capability for using EM sensors (i.e., radar units) to measure body organ motions and positions has been available for decades. Impediments to their use appear to have been size, excessive power, lack of resolution, and lack of understanding of the value of organ motion measurements, especially as applied to speech related technologies. However, with the invention of very low power, portable systems as demonstrated by McEwan at LLNL researchers have begun to think differently about practical applications of such radars. In particular, his demonstrations of heart and lung motions have opened up many new areas of application for human and animal measurements.« less
Voxel-based morphometry of auditory and speech-related cortex in stutterers.
Beal, Deryk S; Gracco, Vincent L; Lafaille, Sophie J; De Nil, Luc F
2007-08-06
Stutterers demonstrate unique functional neural activation patterns during speech production, including reduced auditory activation, relative to nonstutterers. The extent to which these functional differences are accompanied by abnormal morphology of the brain in stutterers is unclear. This study examined the neuroanatomical differences in speech-related cortex between stutterers and nonstutterers using voxel-based morphometry. Results revealed significant differences in localized grey matter and white matter densities of left and right hemisphere regions involved in auditory processing and speech production.
Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali
2015-01-01
In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141
Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)
NASA Technical Reports Server (NTRS)
Huck, R. W. (Compiler); Rafferty, William (Compiler); Reekie, D. Hugh M. (Editor)
1990-01-01
Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression.
ERIC Educational Resources Information Center
Mermelshtine, Roni; Barnes, Jacqueline
2016-01-01
Maternal responsive-didactic caregiving (RDC) and infant advanced object play were investigated in a sample of 400 mothers and their 10-month-old infants during video-recorded semi-structured play interactions. Three maternal behaviours: contingent response, cognitively stimulating language and autonomy-promoting speech were coded and infant…
ERIC Educational Resources Information Center
Ylinen, Sari; Bosseler, Alexis; Junttila, Katja; Huotilainen, Minna
2017-01-01
The ability to predict future events in the environment and learn from them is a fundamental component of adaptive behavior across species. Here we propose that inferring predictions facilitates speech processing and word learning in the early stages of language development. Twelve- and 24-month olds' electrophysiological brain responses to heard…
ERIC Educational Resources Information Center
Gustafsson, Lennart; Paplinski, Andrew
2004-01-01
Autism is a developmental disorder with possibly multiple pathophysiologies. It has been theorized that cortical feature maps in individuals with autism are inadequate for forming abstract codes and representations. Cortical feature maps make it possible to classify stimuli, such as phonemes of speech, disregarding incidental detail. Hierarchies…
Civility and Academic Freedom: Who Defines the Former (and How) May Imperil Rights to the Latter
ERIC Educational Resources Information Center
McDonald, Theodore W.; Stockton, James D.; Landrum, R. Eric
2018-01-01
An alarming occurrence in academia involves the discipline of faculty, under the guise of violating civility or collegiality codes, for engaging in what should be protected academic free speech. This often occurs when unprincipled and/or corporate-minded administrators seek to punish or dissuade faculty from challenging or questioning their…
ERIC Educational Resources Information Center
Hsu, Chien-Ju; Thompson, Cynthia K.
2018-01-01
Purpose: The purpose of this study is to compare the outcomes of the manually coded Northwestern Narrative Language Analysis (NNLA) system, which was developed for characterizing agrammatic production patterns, and the automated Computerized Language Analysis (CLAN) system, which has recently been adopted to analyze speech samples of individuals…
Speech and Hearing Science, Anatomy and Physiology.
ERIC Educational Resources Information Center
Zemlin, Willard R.
Written for those interested in speech pathology and audiology, the text presents the anatomical, physiological, and neurological bases for speech and hearing. Anatomical nomenclature used in the speech and hearing sciences is introduced and the breathing mechanism is defined and discussed in terms of the respiratory passage, the framework and…
Interventions for Speech Sound Disorders in Children
ERIC Educational Resources Information Center
Williams, A. Lynn, Ed.; McLeod, Sharynne, Ed.; McCauley, Rebecca J., Ed.
2010-01-01
With detailed discussion and invaluable video footage of 23 treatment interventions for speech sound disorders (SSDs) in children, this textbook and DVD set should be part of every speech-language pathologist's professional preparation. Focusing on children with functional or motor-based speech disorders from early childhood through the early…
Developing a Weighted Measure of Speech Sound Accuracy
ERIC Educational Resources Information Center
Preston, Jonathan L.; Ramsdell, Heather L.; Oller, D. Kimbrough; Edwards, Mary Louise; Tobin, Stephen J.
2011-01-01
Purpose: To develop a system for numerically quantifying a speaker's phonetic accuracy through transcription-based measures. With a focus on normal and disordered speech in children, the authors describe a system for differentially weighting speech sound errors on the basis of various levels of phonetic accuracy using a Weighted Speech Sound…
Automated Assessment of Speech Fluency for L2 English Learners
ERIC Educational Resources Information Center
Yoon, Su-Youn
2009-01-01
This dissertation provides an automated scoring method of speech fluency for second language learners of English (L2 learners) based that uses speech recognition technology. Non-standard pronunciation, frequent disfluencies, faulty grammar, and inappropriate lexical choices are crucial characteristics of L2 learners' speech. Due to the ease of…
Speech Synthesis Applied to Language Teaching.
ERIC Educational Resources Information Center
Sherwood, Bruce
1981-01-01
The experimental addition of speech output to computer-based Esperanto lessons using speech synthesized from text is described. Because of Esperanto's phonetic spelling and simple rhythm, it is particularly easy to describe the mechanisms of Esperanto synthesis. Attention is directed to how the text-to-speech conversion is performed and the ways…
A novel probabilistic framework for event-based speech recognition
NASA Astrophysics Data System (ADS)
Juneja, Amit; Espy-Wilson, Carol
2003-10-01
One of the reasons for unsatisfactory performance of the state-of-the-art automatic speech recognition (ASR) systems is the inferior acoustic modeling of low-level acoustic-phonetic information in the speech signal. An acoustic-phonetic approach to ASR, on the other hand, explicitly targets linguistic information in the speech signal, but such a system for continuous speech recognition (CSR) is not known to exist. A probabilistic and statistical framework for CSR based on the idea of the representation of speech sounds by bundles of binary valued articulatory phonetic features is proposed. Multiple probabilistic sequences of linguistically motivated landmarks are obtained using binary classifiers of manner phonetic features-syllabic, sonorant and continuant-and the knowledge-based acoustic parameters (APs) that are acoustic correlates of those features. The landmarks are then used for the extraction of knowledge-based APs for source and place phonetic features and their binary classification. Probabilistic landmark sequences are constrained using manner class language models for isolated or connected word recognition. The proposed method could overcome the disadvantages encountered by the early acoustic-phonetic knowledge-based systems that led the ASR community to switch to systems highly dependent on statistical pattern analysis methods and probabilistic language or grammar models.
Syllable-Related Breathing in Infants in the Second Year of Life
Parham, Douglas F.; Buder, Eugene H.; Oller, D. Kimbrough; Boliek, Carol A.
2010-01-01
Purpose This study explored whether breathing behaviors of infants within the second year of life differ between tidal breathing and breathing supporting single unarticulated syllables and canonical/articulated syllables. Method Vocalizations and breathing kinematics of nine infants between 53 and 90 weeks of age were recorded. A strict selection protocol was used to identify analyzable breath cycles. Syllables were categorized based on consensus coding. Inspiratory and expiratory durations, excursions, and slopes were calculated for the three breath cycle types and normalized using mean tidal breath measures. Results Tidal breathing cycles were significantly different from syllable-related cycles on all breathing measures. There were no significant differences between unarticulated syllable cycles and canonical syllable cycles, even after controlling for utterance duration and sound pressure level. Conclusions Infants in the second year of life exhibit clear differences between tidal breathing and speech-related breathing, but categorically distinct breath support for syllable types with varying articulatory demands was not evident in the current findings. Speech development introduces increasingly complex utterances, so older infants may produce detectable articulation-related adaptations of breathing kinematics. For younger infants, breath support may vary systematically among utterance types, due more to phonatory variations than to articulatory demands. PMID:21173390
Central Presbycusis: A Review and Evaluation of the Evidence
Humes, Larry E.; Dubno, Judy R.; Gordon-Salant, Sandra; Lister, Jennifer J.; Cacace, Anthony T.; Cruickshanks, Karen J.; Gates, George A.; Wilson, Richard H.; Wingfield, Arthur
2018-01-01
Background The authors reviewed the evidence regarding the existence of age-related declines in central auditory processes and the consequences of any such declines for everyday communication. Purpose This report summarizes the review process and presents its findings. Data Collection and Analysis The authors reviewed 165 articles germane to central presbycusis. Of the 165 articles, 132 articles with a focus on human behavioral measures for either speech or nonspeech stimuli were selected for further analysis. Results For 76 smaller-scale studies of speech understanding in older adults reviewed, the following findings emerged: (1) the three most commonly studied behavioral measures were speech in competition, temporally distorted speech, and binaural speech perception (especially dichotic listening); (2) for speech in competition and temporally degraded speech, hearing loss proved to have a significant negative effect on performance in most of the laboratory studies; (3) significant negative effects of age, unconfounded by hearing loss, were observed in most of the studies of speech in competing speech, time-compressed speech, and binaural speech perception; and (4) the influence of cognitive processing on speech understanding has been examined much less frequently, but when included, significant positive associations with speech understanding were observed. For 36 smaller-scale studies of the perception of nonspeech stimuli by older adults reviewed, the following findings emerged: (1) the three most frequently studied behavioral measures were gap detection, temporal discrimination, and temporal-order discrimination or identification; (2) hearing loss was seldom a significant factor; and (3) negative effects of age were almost always observed. For 18 studies reviewed that made use of test batteries and medium-to-large sample sizes, the following findings emerged: (1) all studies included speech-based measures of auditory processing; (2) 4 of the 18 studies included nonspeech stimuli; (3) for the speech-based measures, monaural speech in a competing-speech background, dichotic speech, and monaural time-compressed speech were investigated most frequently; (4) the most frequently used tests were the Synthetic Sentence Identification (SSI) test with Ipsilateral Competing Message (ICM), the Dichotic Sentence Identification (DSI) test, and time-compressed speech; (5) many of these studies using speech-based measures reported significant effects of age, but most of these studies were confounded by declines in hearing, cognition, or both; (6) for nonspeech auditory-processing measures, the focus was on measures of temporal processing in all four studies; (7) effects of cognition on nonspeech measures of auditory processing have been studied less frequently, with mixed results, whereas the effects of hearing loss on performance were minimal due to judicious selection of stimuli; and (8) there is a paucity of observational studies using test batteries and longitudinal designs. Conclusions Based on this review of the scientific literature, there is insufficient evidence to confirm the existence of central presbycusis as an isolated entity. On the other hand, recent evidence has been accumulating in support of the existence of central presbycusis as a multifactorial condition that involves age- and/or disease-related changes in the auditory system and in the brain. Moreover, there is a clear need for additional research in this area. PMID:22967738
Foster, Abby; Worrall, Linda; Rose, Miranda; O'Halloran, Robyn
2015-07-01
An evidence-practice gap has been identified in current acute aphasia management practice, with the provision of services to people with aphasia in the acute hospital widely considered in the literature to be inconsistent with best-practice recommendations. The reasons for this evidence-practice gap are unclear; however, speech pathologists practising in this setting have articulated a sense of dissonance regarding their limited service provision to this population. A clearer understanding of why this evidence-practice gap exists is essential in order to support and promote evidence-based approaches to the care of people with aphasia in acute care settings. To provide an understanding of speech pathologists' conceptualization of evidence-based practice for acute post-stroke aphasia, and its implementation. This study adopted a phenomenological approach, underpinned by a social constructivist paradigm. In-depth interviews were conducted with 14 Australian speech pathologists, recruited using a purposive sampling technique. An inductive thematic analysis of the data was undertaken. A single, overarching theme emerged from the data. Speech pathologists demonstrated a sense of disempowerment as a result of their relationship with evidence-based practice for acute aphasia management. Three subthemes contributed to this theme. The first described a restricted conceptualization of evidence-based practice. The second revealed speech pathologists' strained relationships with the research literature. The third elucidated a sense of professional unease over their perceived inability to enact evidence-based clinical recommendations, despite their desire to do so. Speech pathologists identified a current knowledge-practice gap in their management of aphasia in acute hospital settings. Speech pathologists place significant emphasis on the research evidence; however, their engagement with the research is limited, in part because it is perceived to lack clinical utility. A sense of professional dissonance arises from the conflict between a desire to provide best practice and the perceived barriers to implementing evidence-based recommendations clinically, resulting in evidence-based practice becoming a disempowering concept for some. © 2015 Royal College of Speech and Language Therapists.
Neural correlates of behavioral amplitude modulation sensitivity in the budgerigar midbrain
Neilans, Erikson G.; Abrams, Kristina S.; Idrobo, Fabio; Carney, Laurel H.
2016-01-01
Amplitude modulation (AM) is a crucial feature of many communication signals, including speech. Whereas average discharge rates in the auditory midbrain correlate with behavioral AM sensitivity in rabbits, the neural bases of AM sensitivity in species with human-like behavioral acuity are unexplored. Here, we used parallel behavioral and neurophysiological experiments to explore the neural (midbrain) bases of AM perception in an avian speech mimic, the budgerigar (Melopsittacus undulatus). Behavioral AM sensitivity was quantified using operant conditioning procedures. Neural AM sensitivity was studied using chronically implanted microelectrodes in awake, unrestrained birds. Average discharge rates of multiunit recording sites in the budgerigar midbrain were insufficient to explain behavioral sensitivity to modulation frequencies <100 Hz for both tone- and noise-carrier stimuli, even with optimal pooling of information across recording sites. Neural envelope synchrony, in contrast, could explain behavioral performance for both carrier types across the full range of modulation frequencies studied (16–512 Hz). The results suggest that envelope synchrony in the budgerigar midbrain may underlie behavioral sensitivity to AM. Behavioral AM sensitivity based on synchrony in the budgerigar, which contrasts with rate-correlated behavioral performance in rabbits, raises the possibility that envelope synchrony, rather than average discharge rate, might also underlie AM perception in other species with sensitive AM detection abilities, including humans. These results highlight the importance of synchrony coding of envelope structure in the inferior colliculus. Furthermore, they underscore potential benefits of devices (e.g., midbrain implants) that evoke robust neural synchrony. PMID:26843608
Buss, Emily; Leibold, Lori J.; Porter, Heather L.; Grose, John H.
2017-01-01
Children perform more poorly than adults on a wide range of masked speech perception paradigms, but this effect is particularly pronounced when the masker itself is also composed of speech. The present study evaluated two factors that might contribute to this effect: the ability to perceptually isolate the target from masker speech, and the ability to recognize target speech based on sparse cues (glimpsing). Speech reception thresholds (SRTs) were estimated for closed-set, disyllabic word recognition in children (5–16 years) and adults in a one- or two-talker masker. Speech maskers were 60 dB sound pressure level (SPL), and they were either presented alone or in combination with a 50-dB-SPL speech-shaped noise masker. There was an age effect overall, but performance was adult-like at a younger age for the one-talker than the two-talker masker. Noise tended to elevate SRTs, particularly for older children and adults, and when summed with the one-talker masker. Removing time-frequency epochs associated with a poor target-to-masker ratio markedly improved SRTs, with larger effects for younger listeners; the age effect was not eliminated, however. Results were interpreted as indicating that development of speech-in-speech recognition is likely impacted by development of both perceptual masking and the ability recognize speech based on sparse cues. PMID:28464682
Articulatory speech synthesis and speech production modelling
NASA Astrophysics Data System (ADS)
Huang, Jun
This dissertation addresses the problem of speech synthesis and speech production modelling based on the fundamental principles of human speech production. Unlike the conventional source-filter model, which assumes the independence of the excitation and the acoustic filter, we treat the entire vocal apparatus as one system consisting of a fluid dynamic aspect and a mechanical part. We model the vocal tract by a three-dimensional moving geometry. We also model the sound propagation inside the vocal apparatus as a three-dimensional nonplane-wave propagation inside a viscous fluid described by Navier-Stokes equations. In our work, we first propose a combined minimum energy and minimum jerk criterion to estimate the dynamic vocal tract movements during speech production. Both theoretical error bound analysis and experimental results show that this method can achieve very close match at the target points and avoid the abrupt change in articulatory trajectory at the same time. Second, a mechanical vocal fold model is used to compute the excitation signal of the vocal tract. The advantage of this model is that it is closely coupled with the vocal tract system based on fundamental aerodynamics. As a result, we can obtain an excitation signal with much more detail than the conventional parametric vocal fold excitation model. Furthermore, strong evidence of source-tract interaction is observed. Finally, we propose a computational model of the fricative and stop types of sounds based on the physical principles of speech production. The advantage of this model is that it uses an exogenous process to model the additional nonsteady and nonlinear effects due to the flow mode, which are ignored by the conventional source- filter speech production model. A recursive algorithm is used to estimate the model parameters. Experimental results show that this model is able to synthesize good quality fricative and stop types of sounds. Based on our dissertation work, we carefully argue that the articulatory speech production model has the potential to flexibly synthesize natural-quality speech sounds and to provide a compact computational model for speech production that can be beneficial to a wide range of areas in speech signal processing.
NASA Astrophysics Data System (ADS)
Nishiura, Takanobu; Nakamura, Satoshi
2003-10-01
Humans communicate with each other through speech by focusing on the target speech among environmental sounds in real acoustic environments. We can easily identify the target sound from other environmental sounds. For hands-free speech recognition, the identification of the target speech from environmental sounds is imperative. This mechanism may also be important for a self-moving robot to sense the acoustic environments and communicate with humans. Therefore, this paper first proposes hidden Markov model (HMM)-based environmental sound source identification. Environmental sounds are modeled by three states of HMMs and evaluated using 92 kinds of environmental sounds. The identification accuracy was 95.4%. This paper also proposes a new HMM composition method that composes speech HMMs and an HMM of categorized environmental sounds for robust environmental sound-added speech recognition. As a result of the evaluation experiments, we confirmed that the proposed HMM composition outperforms the conventional HMM composition with speech HMMs and a noise (environmental sound) HMM trained using noise periods prior to the target speech in a captured signal. [Work supported by Ministry of Public Management, Home Affairs, Posts and Telecommunications of Japan.
Pathological speech signal analysis and classification using empirical mode decomposition.
Kaleem, Muhammad; Ghoraani, Behnaz; Guergachi, Aziz; Krishnan, Sridhar
2013-07-01
Automated classification of normal and pathological speech signals can provide an objective and accurate mechanism for pathological speech diagnosis, and is an active area of research. A large part of this research is based on analysis of acoustic measures extracted from sustained vowels. However, sustained vowels do not reflect real-world attributes of voice as effectively as continuous speech, which can take into account important attributes of speech such as rapid voice onset and termination, changes in voice frequency and amplitude, and sudden discontinuities in speech. This paper presents a methodology based on empirical mode decomposition (EMD) for classification of continuous normal and pathological speech signals obtained from a well-known database. EMD is used to decompose randomly chosen portions of speech signals into intrinsic mode functions, which are then analyzed to extract meaningful temporal and spectral features, including true instantaneous features which can capture discriminative information in signals hidden at local time-scales. A total of six features are extracted, and a linear classifier is used with the feature vector to classify continuous speech portions obtained from a database consisting of 51 normal and 161 pathological speakers. A classification accuracy of 95.7 % is obtained, thus demonstrating the effectiveness of the methodology.
Speech and Speech-Related Quality of Life After Late Palate Repair: A Patient's Perspective.
Schönmeyr, Björn; Wendby, Lisa; Sharma, Mitali; Jacobson, Lia; Restrepo, Carolina; Campbell, Alex
2015-07-01
Many patients with cleft palate deformities worldwide receive treatment at a later age than is recommended for normal speech to develop. The outcomes after late palate repairs in terms of speech and quality of life (QOL) still remain largely unstudied. In the current study, questionnaires were used to assess the patients' perception of speech and QOL before and after primary palate repair. All of the patients were operated at a cleft center in northeast India and had a cleft palate with a normal lip or with a cleft lip that had been previously repaired. A total of 134 patients (7-35 years) were interviewed preoperatively and 46 patients (7-32 years) were assessed in the postoperative survey. The survey showed that scores based on the speech handicap index, concerning speech and speech-related QOL, did not improve postoperatively. In fact, the questionnaires indicated that the speech became more unpredictable (P < 0.01) and that nasal regurgitation became worse (P < 0.01) for some patients after surgery. A total of 78% of the patients were still satisfied with the surgery and all of the patients reported that their self-confidence had improved after the operation. Thus, the majority of interviewed patients who underwent late primary palate repair were satisfied with the surgery. At the same time, speech and speech-related QOL did not improve according to the speech handicap index-based survey. Speech predictability may even become worse and nasal regurgitation may increase after late palate repair, according to these results.
[Psychosis, language and literature].
Maier, T
1999-05-01
There have always been debates about possible correlations between creative genius and mental illness, not only among psychiatrists but also among scientists of art and literature. Especially modern literary texts may show formal similarities to psychotic speech, which leads to the question, whether not only artists, but also people in psychotic states are able to create literature. This article points out the loosened semantic stability in psychotic speech, which equals a loss of common ground in the use of signs and symbols. In terms of Gadamer's hermeneutics, texts produced by psychotic people cannot be understood, they are mere form. Even in hermetic literary texts, the semantic code can be offended, but in deliberate artistic intention, which finds its communicative purpose in breaking the symbolic order.
What Makes a Caseload (Un) Manageable? School-Based Speech-Language Pathologists Speak
ERIC Educational Resources Information Center
Katz, Lauren A.; Maag, Abby; Fallon, Karen A.; Blenkarn, Katie; Smith, Megan K.
2010-01-01
Purpose: Large caseload sizes and a shortage of speech-language pathologists (SLPs) are ongoing concerns in the field of speech and language. This study was conducted to identify current mean caseload size for school-based SLPs, a threshold at which caseload size begins to be perceived as unmanageable, and variables contributing to school-based…
O-I-C: An Orality-Based Procedure for Teaching Interactive Communication in the Basic Course.
ERIC Educational Resources Information Center
Haynes, W. Lance
In order to improve instruction in basic speech courses, a program was developed adapting creative problem solving to speech preparation and to interactive speech communication. The program, called O-I-C--Orientation, Incubation, and Composition--and based on Howell's five levels of competence and their implications, begins with a thorough study…
ERIC Educational Resources Information Center
Spek, B.; Wieringa-de Waard, M.; Lucas, C.; van Dijk, N.
2013-01-01
Background: The importance and value of the principles of evidence-based practice (EBP) in the decision-making process is recognized by speech-language therapists (SLTs) worldwide and as a result curricula for speech-language therapy students incorporated EBP principles. However, the willingness actually to use EBP principles in their future…
ERIC Educational Resources Information Center
Payne, Elinor; Post, Brechtje; Astruc, Lluisa; Prieto, Pilar; Vanrell, Maria del Mar
2012-01-01
Interval-based rhythm metrics were applied to the speech of English, Catalan and Spanish 2, 4 and 6 year-olds, and compared with the (adult-directed) speech of their mothers. Results reveal that child speech does not fall into a well-defined rhythmic class: for all three languages, it is more "vocalic" (higher %V) than adult speech and…
Treatment of Children with Speech Oral Placement Disorders (OPDs): A Paradigm Emerges
ERIC Educational Resources Information Center
Bahr, Diane; Rosenfeld-Johnson, Sara
2010-01-01
Epidemiological research was used to develop the Speech Disorders Classification System (SDCS). The SDCS is an important speech diagnostic paradigm in the field of speech-language pathology. This paradigm could be expanded and refined to also address treatment while meeting the standards of evidence-based practice. The article assists that process…
Assessment of communication abilities in multilingual children: Language rights or human rights?
Cruz-Ferreira, Madalena
2018-02-01
Communication involves a sender, a receiver and a shared code operating through shared rules. Breach of communication results from disruption to any of these basic components of a communicative chain, although assessment of communication abilities typically focuses on senders/receivers, on two assumptions: first, that their command of features and rules of the language in question (the code), such as sounds, words or word order, as described in linguists' theorisations, represents the full scope of linguistic competence; and second, that languages are stable, homogeneous entities, unaffected by their users' communicative needs. Bypassing the role of the code in successful communication assigns decisive rights to abstract languages rather than to real-life language users, routinely leading to suspected or diagnosed speech-language disorder in academic and clinical assessment of multilingual children's communicative skills. This commentary reflects on whether code-driven assessment practices comply with the spirit of Article 19 of the Universal Declaration of Human Rights.
The Development of Bimodal Bilingualism: Implications for Linguistic Theory.
Lillo-Martin, Diane; de Quadros, Ronice Müller; Pichler, Deborah Chen
2016-01-01
A wide range of linguistic phenomena contribute to our understanding of the architecture of the human linguistic system. In this paper we present a proposal dubbed Language Synthesis to capture bilingual phenomena including code-switching and 'transfer' as automatic consequences of the addition of a second language, using basic concepts of Minimalism and Distributed Morphology. Bimodal bilinguals, who use a sign language and a spoken language, provide a new type of evidence regarding possible bilingual phenomena, namely code-blending, the simultaneous production of (aspects of) a message in both speech and sign. We argue that code-blending also follows naturally once a second articulatory interface is added to the model. Several different types of code-blending are discussed in connection to the predictions of the Synthesis model. Our primary data come from children developing as bimodal bilinguals, but our proposal is intended to capture a wide range of bilingual effects across any language pair.
The Development of Bimodal Bilingualism: Implications for Linguistic Theory
Lillo-Martin, Diane; de Quadros, Ronice Müller; Pichler, Deborah Chen
2017-01-01
A wide range of linguistic phenomena contribute to our understanding of the architecture of the human linguistic system. In this paper we present a proposal dubbed Language Synthesis to capture bilingual phenomena including code-switching and ‘transfer’ as automatic consequences of the addition of a second language, using basic concepts of Minimalism and Distributed Morphology. Bimodal bilinguals, who use a sign language and a spoken language, provide a new type of evidence regarding possible bilingual phenomena, namely code-blending, the simultaneous production of (aspects of) a message in both speech and sign. We argue that code-blending also follows naturally once a second articulatory interface is added to the model. Several different types of code-blending are discussed in connection to the predictions of the Synthesis model. Our primary data come from children developing as bimodal bilinguals, but our proposal is intended to capture a wide range of bilingual effects across any language pair. PMID:28603576
Teki, Sundeep; Barnes, Gareth R; Penny, William D; Iverson, Paul; Woodhead, Zoe V J; Griffiths, Timothy D; Leff, Alexander P
2013-06-01
In this study, we used magnetoencephalography and a mismatch paradigm to investigate speech processing in stroke patients with auditory comprehension deficits and age-matched control subjects. We probed connectivity within and between the two temporal lobes in response to phonemic (different word) and acoustic (same word) oddballs using dynamic causal modelling. We found stronger modulation of self-connections as a function of phonemic differences for control subjects versus aphasics in left primary auditory cortex and bilateral superior temporal gyrus. The patients showed stronger modulation of connections from right primary auditory cortex to right superior temporal gyrus (feed-forward) and from left primary auditory cortex to right primary auditory cortex (interhemispheric). This differential connectivity can be explained on the basis of a predictive coding theory which suggests increased prediction error and decreased sensitivity to phonemic boundaries in the aphasics' speech network in both hemispheres. Within the aphasics, we also found behavioural correlates with connection strengths: a negative correlation between phonemic perception and an inter-hemispheric connection (left superior temporal gyrus to right superior temporal gyrus), and positive correlation between semantic performance and a feedback connection (right superior temporal gyrus to right primary auditory cortex). Our results suggest that aphasics with impaired speech comprehension have less veridical speech representations in both temporal lobes, and rely more on the right hemisphere auditory regions, particularly right superior temporal gyrus, for processing speech. Despite this presumed compensatory shift in network connectivity, the patients remain significantly impaired.
Barnes, Gareth R.; Penny, William D.; Iverson, Paul; Woodhead, Zoe V. J.; Griffiths, Timothy D.; Leff, Alexander P.
2013-01-01
In this study, we used magnetoencephalography and a mismatch paradigm to investigate speech processing in stroke patients with auditory comprehension deficits and age-matched control subjects. We probed connectivity within and between the two temporal lobes in response to phonemic (different word) and acoustic (same word) oddballs using dynamic causal modelling. We found stronger modulation of self-connections as a function of phonemic differences for control subjects versus aphasics in left primary auditory cortex and bilateral superior temporal gyrus. The patients showed stronger modulation of connections from right primary auditory cortex to right superior temporal gyrus (feed-forward) and from left primary auditory cortex to right primary auditory cortex (interhemispheric). This differential connectivity can be explained on the basis of a predictive coding theory which suggests increased prediction error and decreased sensitivity to phonemic boundaries in the aphasics’ speech network in both hemispheres. Within the aphasics, we also found behavioural correlates with connection strengths: a negative correlation between phonemic perception and an inter-hemispheric connection (left superior temporal gyrus to right superior temporal gyrus), and positive correlation between semantic performance and a feedback connection (right superior temporal gyrus to right primary auditory cortex). Our results suggest that aphasics with impaired speech comprehension have less veridical speech representations in both temporal lobes, and rely more on the right hemisphere auditory regions, particularly right superior temporal gyrus, for processing speech. Despite this presumed compensatory shift in network connectivity, the patients remain significantly impaired. PMID:23715097
Mothers Consistently Alter Their Unique Vocal Fingerprints When Communicating with Infants.
Piazza, Elise A; Iordan, Marius Cătălin; Lew-Williams, Casey
2017-10-23
The voice is the most direct link we have to others' minds, allowing us to communicate using a rich variety of speech cues [1, 2]. This link is particularly critical early in life as parents draw infants into the structure of their environment using infant-directed speech (IDS), a communicative code with unique pitch and rhythmic characteristics relative to adult-directed speech (ADS) [3, 4]. To begin breaking into language, infants must discern subtle statistical differences about people and voices in order to direct their attention toward the most relevant signals. Here, we uncover a new defining feature of IDS: mothers significantly alter statistical properties of vocal timbre when speaking to their infants. Timbre, the tone color or unique quality of a sound, is a spectral fingerprint that helps us instantly identify and classify sound sources, such as individual people and musical instruments [5-7]. We recorded 24 mothers' naturalistic speech while they interacted with their infants and with adult experimenters in their native language. Half of the participants were English speakers, and half were not. Using a support vector machine classifier, we found that mothers consistently shifted their timbre between ADS and IDS. Importantly, this shift was similar across languages, suggesting that such alterations of timbre may be universal. These findings have theoretical implications for understanding how infants tune in to their local communicative environments. Moreover, our classification algorithm for identifying infant-directed timbre has direct translational implications for speech recognition technology. Copyright © 2017 Elsevier Ltd. All rights reserved.
Lu, Huanhuan; Wang, Fuzhong; Zhang, Huichun
2016-04-01
Traditional speech detection methods regard the noise as a jamming signal to filter,but under the strong noise background,these methods lost part of the original speech signal while eliminating noise.Stochastic resonance can use noise energy to amplify the weak signal and suppress the noise.According to stochastic resonance theory,a new method based on adaptive stochastic resonance to extract weak speech signals is proposed.This method,combined with twice sampling,realizes the detection of weak speech signals from strong noise.The parameters of the systema,b are adjusted adaptively by evaluating the signal-to-noise ratio of the output signal,and then the weak speech signal is optimally detected.Experimental simulation analysis showed that under the background of strong noise,the output signal-to-noise ratio increased from the initial value-7dB to about 0.86 dB,with the gain of signalto-noise ratio is 7.86 dB.This method obviously raises the signal-to-noise ratio of the output speech signals,which gives a new idea to detect the weak speech signals in strong noise environment.
A Secure Base in Adolescence: Markers of Attachment Security in the Mother–Adolescent Relationship
Allen, Joseph P.; McElhaney, Kathleen Boykin; Land, Deborah J.; Kuperminc, Gabriel P.; Moore, Cynthia W.; O’Beirne-Kelly, Heather; Kilmer, Sarah Liebman
2017-01-01
This study sought to identify ways in which adolescent attachment security, as assessed via the Adult Attachment Interview, is manifest in qualities of the secure base provided by the mother–adolescent relationship. Assessments included data coded from mother–adolescent interactions, test-based data, and adolescent self-reports obtained from an ethnically and socioeconomically diverse sample of moderately at-risk 9th and 10th graders. This study found several robust markers of adolescent attachment security in the mother–adolescent relationship. Each of these markers was found to contribute unique variance to explaining adolescent security, and in combination, they accounted for as much as 40% of the raw variance in adolescent security. These findings suggest that security is closely connected to the workings of the mother–adolescent relationship via a secure-base phenomenon, in which the teen can explore independence in thought and speech from the secure base of a maternal relationship characterized by maternal attunement to the adolescent and maternal supportiveness. PMID:12625451
Motor-based intervention protocols in treatment of childhood apraxia of speech (CAS)
Maas, Edwin; Gildersleeve-Neumann, Christina; Jakielski, Kathy J.; Stoeckel, Ruth
2014-01-01
This paper reviews current trends in treatment for childhood apraxia of speech (CAS), with a particular emphasis on motor-based intervention protocols. The paper first briefly discusses how CAS fits into the typology of speech sound disorders, followed by a discussion of the potential relevance of principles derived from the motor learning literature for CAS treatment. Next, different motor-based treatment protocols are reviewed, along with their evidence base. The paper concludes with a summary and discussion of future research needs. PMID:25313348
Effects of emotion on different phoneme classes
NASA Astrophysics Data System (ADS)
Lee, Chul Min; Yildirim, Serdar; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Abe; Lee, Sungbok; Narayanan, Shrikanth
2004-10-01
This study investigates the effects of emotion on different phoneme classes using short-term spectral features. In the research on emotion in speech, most studies have focused on prosodic features of speech. In this study, based on the hypothesis that different emotions have varying effects on the properties of the different speech sounds, we investigate the usefulness of phoneme-class level acoustic modeling for automatic emotion classification. Hidden Markov models (HMM) based on short-term spectral features for five broad phonetic classes are used for this purpose using data obtained from recordings of two actresses. Each speaker produces 211 sentences with four different emotions (neutral, sad, angry, happy). Using the speech material we trained and compared the performances of two sets of HMM classifiers: a generic set of ``emotional speech'' HMMs (one for each emotion) and a set of broad phonetic-class based HMMs (vowel, glide, nasal, stop, fricative) for each emotion type considered. Comparison of classification results indicates that different phoneme classes were affected differently by emotional change and that the vowel sounds are the most important indicator of emotions in speech. Detailed results and their implications on the underlying speech articulation will be discussed.
Speech enhancement using the modified phase-opponency model.
Deshmukh, Om D; Espy-Wilson, Carol Y; Carney, Laurel H
2007-06-01
In this paper we present a model called the Modified Phase-Opponency (MPO) model for single-channel speech enhancement when the speech is corrupted by additive noise. The MPO model is based on the auditory PO model, proposed for detection of tones in noise. The PO model includes a physiologically realistic mechanism for processing the information in neural discharge times and exploits the frequency-dependent phase properties of the tuned filters in the auditory periphery by using a cross-auditory-nerve-fiber coincidence detection for extracting temporal cues. The MPO model alters the components of the PO model such that the basic functionality of the PO model is maintained but the properties of the model can be analyzed and modified independently. The MPO-based speech enhancement scheme does not need to estimate the noise characteristics nor does it assume that the noise satisfies any statistical model. The MPO technique leads to the lowest value of the LPC-based objective measures and the highest value of the perceptual evaluation of speech quality measure compared to other methods when the speech signals are corrupted by fluctuating noise. Combining the MPO speech enhancement technique with our aperiodicity, periodicity, and pitch detector further improves its performance.
Integrating speech in time depends on temporal expectancies and attention.
Scharinger, Mathias; Steinberg, Johanna; Tavano, Alessandro
2017-08-01
Sensory information that unfolds in time, such as in speech perception, relies on efficient chunking mechanisms in order to yield optimally-sized units for further processing. Whether or not two successive acoustic events receive a one-unit or a two-unit interpretation seems to depend on the fit between their temporal extent and a stipulated temporal window of integration. However, there is ongoing debate on how flexible this temporal window of integration should be, especially for the processing of speech sounds. Furthermore, there is no direct evidence of whether attention may modulate the temporal constraints on the integration window. For this reason, we here examine how different word durations, which lead to different temporal separations of sound onsets, interact with attention. In an Electroencephalography (EEG) study, participants actively and passively listened to words where word-final consonants were occasionally omitted. Words had either a natural duration or were artificially prolonged in order to increase the separation of speech sound onsets. Omission responses to incomplete speech input, originating in left temporal cortex, decreased when the critical speech sound was separated from previous sounds by more than 250 msec, i.e., when the separation was larger than the stipulated temporal window of integration (125-150 msec). Attention, on the other hand, only increased omission responses for stimuli with natural durations. We complemented the event-related potential (ERP) analyses by a frequency-domain analysis on the stimulus presentation rate. Notably, the power of stimulation frequency showed the same duration and attention effects than the omission responses. We interpret these findings on the background of existing research on temporal integration windows and further suggest that our findings may be accounted for within the framework of predictive coding. Copyright © 2017 Elsevier Ltd. All rights reserved.
Communicative performance of adolescents with severe speech impairment: influence of context.
Dalton, B M; Bedrosian, J L
1989-08-01
The communicative performance of 4 preoperational-level adolescents, using limited speech, gestures, and communication board techniques, was examined in a two-part investigation. In Part 1, each subject participated in an academic interaction with a teacher in a therapy room. Data were transcribed and coded for communication mode, function, and role. Two subjects were found to predominantly use the speech mode, while the remaining 2 predominantly used board and one other mode. The majority of productions consisted of responses to requests, and the initiator role was infrequently occupied. These findings were similar to those reported in previous investigations conducted in classroom settings. In Part 2, another examination of the communicative performance of these subjects was conducted in spontaneous interactions involving speaking and nonspeaking peers in a therapy room. Using the same data analysis procedures, gesture and speech modes predominated for 3 of the subjects in the nonspeaking peer interactions. The remaining subject exhibited minimal interaction. No consistent pattern of mode usage was exhibited across the speaking peer interactions. In the nonspeaking peer interactions, request predominated. In contrast, a variety of communication functions was exhibited in the speaking peer interactions. Both the initiator and the maintainer roles were occupied in the majority of interactions. Pertinent variables and clinical implications are discussed.
Joint Spatial-Spectral Feature Space Clustering for Speech Activity Detection from ECoG Signals
Kanas, Vasileios G.; Mporas, Iosif; Benz, Heather L.; Sgarbas, Kyriakos N.; Bezerianos, Anastasios; Crone, Nathan E.
2014-01-01
Brain machine interfaces for speech restoration have been extensively studied for more than two decades. The success of such a system will depend in part on selecting the best brain recording sites and signal features corresponding to speech production. The purpose of this study was to detect speech activity automatically from electrocorticographic signals based on joint spatial-frequency clustering of the ECoG feature space. For this study, the ECoG signals were recorded while a subject performed two different syllable repetition tasks. We found that the optimal frequency resolution to detect speech activity from ECoG signals was 8 Hz, achieving 98.8% accuracy by employing support vector machines (SVM) as a classifier. We also defined the cortical areas that held the most information about the discrimination of speech and non-speech time intervals. Additionally, the results shed light on the distinct cortical areas associated with the two syllable repetition tasks and may contribute to the development of portable ECoG-based communication. PMID:24658248
ERIC Educational Resources Information Center
Baker, Elise; McLeod, Sharynne
2011-01-01
Purpose: This article provides both a tutorial and a clinical example of how speech-language pathologists (SLPs) can conduct evidence-based practice (EBP) when working with children with speech sound disorders (SSDs). It is a companion paper to the narrative review of 134 intervention studies for children who have an SSD (Baker & McLeod, 2011).…
ERIC Educational Resources Information Center
Huettig, Falk; Hartsuiker, Robert J.
2010-01-01
Theories of verbal self-monitoring generally assume an internal (pre-articulatory) monitoring channel, but there is debate about whether this channel relies on speech perception or on production-internal mechanisms. Perception-based theories predict that listening to one's own inner speech has similar behavioural consequences as listening to…
Public Speaking Anxiety: Comparing Face-to-Face and Web-Based Speeches
ERIC Educational Resources Information Center
Campbell, Scott; Larson, James
2013-01-01
This study is to determine whether or not students have a different level of anxiety between giving a speech to a group of people in a traditional face-to-face classroom setting to a speech given to an audience (visible on a projected screen) into a camera using distance or web-based technology. The study included approximately 70 students.…
The influence of speech rate and accent on access and use of semantic information.
Sajin, Stanislav M; Connine, Cynthia M
2017-04-01
Circumstances in which the speech input is presented in sub-optimal conditions generally lead to processing costs affecting spoken word recognition. The current study indicates that some processing demands imposed by listening to difficult speech can be mitigated by feedback from semantic knowledge. A set of lexical decision experiments examined how foreign accented speech and word duration impact access to semantic knowledge in spoken word recognition. Results indicate that when listeners process accented speech, the reliance on semantic information increases. Speech rate was not observed to influence semantic access, except in the setting in which unusually slow accented speech was presented. These findings support interactive activation models of spoken word recognition in which attention is modulated based on speech demands.
New Ways in Teaching Connected Speech. New Ways Series
ERIC Educational Resources Information Center
Brown, James Dean, Ed.
2012-01-01
Connected speech is based on a set of rules used to modify pronunciations so that words connect and flow more smoothly in natural speech (hafta versus have to). Native speakers of English tend to feel that connected speech is friendlier, more natural, more sympathetic, and more personal. Is there any reason why learners of English would prefer to…
ERIC Educational Resources Information Center
Crowe, Kathryn; Cumming, Tamara; McCormack, Jane; Baker, Elise; McLeod, Sharynne; Wren, Yvonne; Roulstone, Sue; Masso, Sarah
2017-01-01
Early childhood educators are frequently called on to support preschool-aged children with speech sound disorders and to engage these children in activities that target their speech production. This study explored factors that acted as facilitators and/or barriers to the provision of computer-based support for children with speech sound disorders…
2014-01-01
The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined. Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production. PMID:25060583
Zourmand, Alireza; Mirhassani, Seyed Mostafa; Ting, Hua-Nong; Bux, Shaik Ismail; Ng, Kwan Hoong; Bilgen, Mehmet; Jalaludin, Mohd Amin
2014-07-25
The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined.Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production.
Speech dynamics are coded in the left motor cortex in fluent speakers but not in adults who stutter
Hoang, T. N. Linh; Neef, Andreas; Paulus, Walter; Sommer, Martin
2015-01-01
The precise excitability regulation of neuronal circuits in the primary motor cortex is central to the successful and fluent production of speech. Our question was whether the involuntary execution of undesirable movements, e.g. stuttering, is linked to an insufficient excitability tuning of neural populations in the orofacial region of the primary motor cortex. We determined the speech-related time course of excitability modulation in the left and right primary motor tongue representation. Thirteen fluent speakers (four females, nine males; aged 23–44) and 13 adults who stutter (four females, nine males, aged 21–55) were asked to build verbs with the verbal prefix ‘auf’. Single-pulse transcranial magnetic stimulation was applied over the primary motor cortex during the transition phase between a fixed labiodental articulatory configuration and immediately following articulatory configurations, at different latencies after transition onset. Bilateral electromyography was recorded from self-adhesive electrodes placed on the surface of the tongue. Off-line, we extracted the motor evoked potential amplitudes and normalized these amplitudes to the individual baseline excitability during the fixed configuration. Fluent speakers demonstrated a prominent left hemisphere increase of motor cortex excitability in the transition phase (P = 0.009). In contrast, the excitability of the right primary motor tongue representation was unchanged. Interestingly, adults afflicted with stuttering revealed a lack of left-hemisphere facilitation. Moreover, the magnitude of facilitation was negatively correlated with stuttering frequency. Although orofacial midline muscles are bilaterally innervated from corticobulbar projections of both hemispheres, our results indicate that speech motor plans are controlled primarily in the left primary speech motor cortex. This speech motor planning-related asymmetry towards the left orofacial motor cortex is missing in stuttering. Moreover, a negative correlation between the amount of facilitation and stuttering severity suggests that we discovered a main physiological principle of fluent speech production and its role in stuttering. PMID:25595146
Fast discrete cosine transform structure suitable for implementation with integer computation
NASA Astrophysics Data System (ADS)
Jeong, Yeonsik; Lee, Imgeun
2000-10-01
The discrete cosine transform (DCT) has wide applications in speech and image coding. We propose a fast DCT scheme with the property of reduced multiplication stages and fewer additions and multiplications. The proposed algorithm is structured so that most multiplications are performed at the final stage, which reduces the propagation error that could occur in the integer computation.
ERIC Educational Resources Information Center
Stenton, Anthony
2013-01-01
The CNRS-financed authoring system SWANS (Synchronised Web Authoring Notation System), now used in several CercleS centres, was developed by teams from four laboratories as a personalised learning tool for the purpose of making available knowledge about lexical stress patterns and mother-tongue interference in L2 speech production--helping…
Social Factors for Code-Switching in Tunisian Business Companies: A Case Study
ERIC Educational Resources Information Center
Baoueb, Lamia Bach
2009-01-01
Although the literature on CS between Arabic and French in different bilingual speech communities is wide, few studies have dealt with the Tunisian context and no previous work has ever been done on the Tunisian business sector as a specific group using more than one pair of languages to communicate. This case study investigates the variety of…
A Study of the Communicative Abilities of Disadvantaged Children. Final Report.
ERIC Educational Resources Information Center
Osser, Harry; And Others
The purpose of this series of four studies was to precisely describe the code and dialect features of the speech of both lower class Negro children and middle class white children. In the first study, 16 white middle class (WMC) children were compared to 16 Negro lower class (NLC) children on both an imitation and a comprehension task. The WMC…
ERIC Educational Resources Information Center
Gonzalez Lopez, Veronica
2012-01-01
The present study examines the production outcomes of late second language (L2) learners in order to determine if the mechanisms that allow the creation of phonetic categories remains available during the lifespan, as the Speech Language Model (SLM) claims. In addition, the study focuses on the type of interaction that exists between the first…
Free to Teach, Free to Learn: Understanding and Maintaining Academic Freedom in Higher Education
ERIC Educational Resources Information Center
Wildavsky, Rachel; O'Connor, Erin
2013-01-01
This guide for trustees reports on the dangerous decline of academic freedom and intellectual diversity on college campuses. The foreword, by Benno Schmidt, chairman of the CUNY Board of Trustees and former president of Yale, comes at a time when duly-invited graduation speakers are made unwelcome, campus speech codes threaten the free exchange of…
CACTI: Free, Open-Source Software for the Sequential Coding of Behavioral Interactions
Glynn, Lisa H.; Hallgren, Kevin A.; Houck, Jon M.; Moyers, Theresa B.
2012-01-01
The sequential analysis of client and clinician speech in psychotherapy sessions can help to identify and characterize potential mechanisms of treatment and behavior change. Previous studies required coding systems that were time-consuming, expensive, and error-prone. Existing software can be expensive and inflexible, and furthermore, no single package allows for pre-parsing, sequential coding, and assignment of global ratings. We developed a free, open-source, and adaptable program to meet these needs: The CASAA Application for Coding Treatment Interactions (CACTI). Without transcripts, CACTI facilitates the real-time sequential coding of behavioral interactions using WAV-format audio files. Most elements of the interface are user-modifiable through a simple XML file, and can be further adapted using Java through the terms of the GNU Public License. Coding with this software yields interrater reliabilities comparable to previous methods, but at greatly reduced time and expense. CACTI is a flexible research tool that can simplify psychotherapy process research, and has the potential to contribute to the improvement of treatment content and delivery. PMID:22815713
Kenny, Belinda; Lincoln, Michelle; Balandin, Susan
2010-05-01
To investigate the approaches of experienced speech-language pathologists (SLPs) to ethical reasoning and the processes they use to resolve ethical dilemmas. Ten experienced SLPs participated in in-depth interviews. A narrative approach was used to guide participants' descriptions of how they resolved ethical dilemmas. Individual narrative transcriptions were analyzed by using the participant's words to develop an ethical story that described and interpreted their responses to dilemmas. Key concepts from individual stories were then coded into group themes to reflect participants' reasoning processes. Five major themes reflected participants' approaches to ethical reasoning: (a) focusing on the well-being of the client, (b) fulfilling professional roles and responsibilities, (c) attending to professional relationships, (d) managing resources, and (e) integrating personal and professional values. SLPs demonstrated a range of ethical reasoning processes: applying bioethical principles, casuistry, and narrative reasoning when managing ethical dilemmas in the workplace. The results indicate that experienced SLPs adopted an integrated approach to ethical reasoning. They supported clients' rights to make health care choices. Bioethical principles, casuistry, and narrative reasoning provided useful frameworks for facilitating health professionals' application of codes of ethics to complex professional practice issues.
Harlander, Niklas; Rosenkranz, Tobias; Hohmann, Volker
2012-08-01
Single channel noise reduction has been well investigated and seems to have reached its limits in terms of speech intelligibility improvement, however, the quality of such schemes can still be advanced. This study tests to what extent novel model-based processing schemes might improve performance in particular for non-stationary noise conditions. Two prototype model-based algorithms, a speech-model-based, and a auditory-model-based algorithm were compared to a state-of-the-art non-parametric minimum statistics algorithm. A speech intelligibility test, preference rating, and listening effort scaling were performed. Additionally, three objective quality measures for the signal, background, and overall distortions were applied. For a better comparison of all algorithms, particular attention was given to the usage of the similar Wiener-based gain rule. The perceptual investigation was performed with fourteen hearing-impaired subjects. The results revealed that the non-parametric algorithm and the auditory model-based algorithm did not affect speech intelligibility, whereas the speech-model-based algorithm slightly decreased intelligibility. In terms of subjective quality, both model-based algorithms perform better than the unprocessed condition and the reference in particular for highly non-stationary noise environments. Data support the hypothesis that model-based algorithms are promising for improving performance in non-stationary noise conditions.
Dwivedi, Raghav C; St Rose, Suzanne; Chisholm, Edward J; Bisase, Brian; Amen, Furrat; Nutting, Christopher M; Clarke, Peter M; Kerawala, Cyrus J; Rhys-Evans, Peter H; Harrington, Kevin J; Kazi, Rehan
2012-06-01
The aim of this study was to explore post-treatment speech impairments using English version of Speech Handicap Index (SHI) (first speech-specific questionnaire) in a cohort of oral cavity (OC) and oropharyngeal (OP) cancer patients. Sixty-three consecutive OC and OP cancer patients in follow-up participated in this study. Descriptive analyses have been presented as percentages, while Mann-Whitney U-test and Kruskall-Wallis test have been used for the quantitative variables. Statistical Package for Social Science-15 statistical software (SPSS Inc., Chicago, IL) was used for the statistical analyses. Over a third (36.1%) of patients reported their speech as either average or bad. Speech intelligibility and articulation were the main speech concerns for 58.8% and 52.9% OC and 31.6% and 34.2% OP cancer patients, respectively. While feeling of incompetent and being less outgoing were the speech-related psychosocial concerns for 64.7% and 23.5% OC and 15.8% and 18.4% OP cancer patients, respectively. Worse speech outcomes were noted for oral tongue and base of tongue cancers vs. tonsillar cancers, mean (SD) values were 56.7 (31.3) and 52.0 (38.4) vs. 10.9 (14.8) (P<0.001) and late vs. early T stage cancers 65.0 (29.9) vs. 29.3 (32.7) (P<0.005). The English version of the SHI is a reliable, valid and useful tool for the evaluation of speech in HNC patients. Over one-third of OC and OP cancer patients reported speech problems in their day-do-day life. Advanced T-stage tumors affecting the oral tongue or base of tongue are particularly associated with poor speech outcomes. Copyright © 2012 Elsevier Ltd. All rights reserved.
Representations of Pitch and Timbre Variation in Human Auditory Cortex
2017-01-01
Pitch and timbre are two primary dimensions of auditory perception, but how they are represented in the human brain remains a matter of contention. Some animal studies of auditory cortical processing have suggested modular processing, with different brain regions preferentially coding for pitch or timbre, whereas other studies have suggested a distributed code for different attributes across the same population of neurons. This study tested whether variations in pitch and timbre elicit activity in distinct regions of the human temporal lobes. Listeners were presented with sequences of sounds that varied in either fundamental frequency (eliciting changes in pitch) or spectral centroid (eliciting changes in brightness, an important attribute of timbre), with the degree of pitch or timbre variation in each sequence parametrically manipulated. The BOLD responses from auditory cortex increased with increasing sequence variance along each perceptual dimension. The spatial extent, region, and laterality of the cortical regions most responsive to variations in pitch or timbre at the univariate level of analysis were largely overlapping. However, patterns of activation in response to pitch or timbre variations were discriminable in most subjects at an individual level using multivoxel pattern analysis, suggesting a distributed coding of the two dimensions bilaterally in human auditory cortex. SIGNIFICANCE STATEMENT Pitch and timbre are two crucial aspects of auditory perception. Pitch governs our perception of musical melodies and harmonies, and conveys both prosodic and (in tone languages) lexical information in speech. Brightness—an aspect of timbre or sound quality—allows us to distinguish different musical instruments and speech sounds. Frequency-mapping studies have revealed tonotopic organization in primary auditory cortex, but the use of pure tones or noise bands has precluded the possibility of dissociating pitch from brightness. Our results suggest a distributed code, with no clear anatomical distinctions between auditory cortical regions responsive to changes in either pitch or timbre, but also reveal a population code that can differentiate between changes in either dimension within the same cortical regions. PMID:28025255
Rhythmic patterning in Malaysian and Singapore English.
Tan, Rachel Siew Kuang; Low, Ee-Ling
2014-06-01
Previous work on the rhythm of Malaysian English has been based on impressionistic observations. This paper utilizes acoustic analysis to measure the rhythmic patterns of Malaysian English. Recordings of the read speech and spontaneous speech of 10 Malaysian English speakers were analyzed and compared with recordings of an equivalent sample of Singaporean English speakers. Analysis was done using two rhythmic indexes, the PVI and VarcoV. It was found that although the rhythm of read speech of the Singaporean speakers was syllable-based as described by previous studies, the rhythm of the Malaysian speakers was even more syllable-based. Analysis of the syllables in specific utterances showed that Malaysian speakers did not reduce vowels as much as Singaporean speakers in cases of syllables in utterances. Results of the spontaneous speech confirmed the findings for the read speech; that is, the same rhythmic patterning was found which normally triggers vowel reductions.
Assessing Auditory Discrimination Skill of Malay Children Using Computer-based Method.
Ting, H; Yunus, J; Mohd Nordin, M Z
2005-01-01
The purpose of this paper is to investigate the auditory discrimination skill of Malay children using computer-based method. Currently, most of the auditory discrimination assessments are conducted manually by Speech-Language Pathologist. These conventional tests are actually general tests of sound discrimination, which do not reflect the client's specific speech sound errors. Thus, we propose computer-based Malay auditory discrimination test to automate the whole process of assessment as well as to customize the test according to the specific speech error sounds of the client. The ability in discriminating voiced and unvoiced Malay speech sounds was studied for the Malay children aged between 7 and 10 years old. The study showed no major difficulty for the children in discriminating the Malay speech sounds except differentiating /g/-/k/ sounds. Averagely the children of 7 years old failed to discriminate /g/-/k/ sounds.
Walking the talk--speech activates the leg motor cortex.
Liuzzi, Gianpiero; Ellger, Tanja; Flöel, Agnes; Breitenstein, Caterina; Jansen, Andreas; Knecht, Stefan
2008-09-01
Speech may have evolved from earlier modes of communication based on gestures. Consistent with such a motor theory of speech, cortical orofacial and hand motor areas are activated by both speech production and speech perception. However, the extent of speech-related activation of the motor cortex remains unclear. Therefore, we examined if reading and listening to continuous prose also activates non-brachiofacial motor representations like the leg motor cortex. We found corticospinal excitability of bilateral leg muscle representations to be enhanced by speech production and silent reading. Control experiments showed that speech production yielded stronger facilitation of the leg motor system than non-verbal tongue-mouth mobilization and silent reading more than a visuo-attentional task thus indicating speech-specificity of the effect. In the frame of the motor theory of speech this finding suggests that the system of gestural communication, from which speech may have evolved, is not confined to the hand but includes gestural movements of other body parts as well.
"Who" is saying "what"? Brain-based decoding of human voice and speech.
Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer
2008-11-07
Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.
Speech transformations based on a sinusoidal representation
NASA Astrophysics Data System (ADS)
Quatieri, T. E.; McAulay, R. J.
1986-05-01
A new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformation including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency and rate of articulation. The method is based on a sinusoidal representation of the speech production mechanism that has been shown to produce synthetic speech that preserves the waveform shape and is essentially perceptually indistinguishable from the original. Although the analysis/synthesis system originally was designed for single-speaker signals, it is equally capable of recovering and modifying nonspeech signals such as music; multiple speakers, marine biologic sounds, and speakers in the presence of interferences such as noise and musical backgrounds.
Nordberg, A; Miniscalco, C; Lohmander, A; Himmelmann, K
2013-02-01
To describe speech ability in a population-based study of children with cerebral palsy (CP), in relation to CP subtype, motor function, cognitive level and neuroimaging findings. A retrospective chart review of 129 children (66 girls, 63 boys) with CP, born in 1999-2002, was carried out. Speech ability and background information, such as type of CP, motor function, cognitive level and neuroimaging data, were collected and analysed. Speech disorders were found in 21% of the children and were present in all types of CP. Forty-one per cent of the children with speech disorders also had mental retardation, and 42% were able to walk independently. A further 32% of the children were nonverbal, and maldevelopment and basal ganglia lesions were most common in this group. The remaining 47% had no speech disorders, and this group was most likely to display white matter lesions of immaturity. More than half of the children in this CP cohort had a speech disorder (21%) or were nonverbal (32%). Speech ability was related to the type of CP, gross motor function, the presence of mental retardation and the localization of brain maldevelopment and lesions. Neuroimaging results differed between the three speech ability groups. ©2012 The Author(s)/Acta Paediatrica ©2012 Foundation Acta Paediatrica.
Buechner, Andreas; Beynon, Andy; Szyfter, Witold; Niemczyk, Kazimierz; Hoppe, Ulrich; Hey, Matthias; Brokx, Jan; Eyles, Julie; Van de Heyning, Paul; Paludetti, Gaetano; Zarowski, Andrzej; Quaranta, Nicola; Wesarg, Thomas; Festen, Joost; Olze, Heidi; Dhooge, Ingeborg; Müller-Deile, Joachim; Ramos, Angel; Roman, Stephane; Piron, Jean-Pierre; Cuda, Domenico; Burdo, Sandro; Grolman, Wilko; Vaillard, Samantha Roux; Huarte, Alicia; Frachet, Bruno; Morera, Constantine; Garcia-Ibáñez, Luis; Abels, Daniel; Walger, Martin; Müller-Mazotta, Jochen; Leone, Carlo Antonio; Meyer, Bernard; Dillier, Norbert; Steffens, Thomas; Gentine, André; Mazzoli, Manuela; Rypkema, Gerben; Killian, Matthijs; Smoorenburg, Guido
2011-11-01
Efficacy of the SPEAK and ACE coding strategies was compared with that of a new strategy, MP3000™, by 37 European implant centers including 221 subjects. The SPEAK and ACE strategies are based on selection of 8-10 spectral components with the highest levels, while MP3000 is based on the selection of only 4-6 components, with the highest levels relative to an estimate of the spread of masking. The pulse rate per component was fixed. No significant difference was found for the speech scores and for coding preference between the SPEAK/ACE and MP3000 strategies. Battery life was 24% longer for the MP3000 strategy. With MP3000 the best results were found for a selection of six components. In addition, the best results were found for a masking function with a low-frequency slope of 50 dB/Bark and a high-frequency slope of 37 dB/Bark (50/37) as compared to the other combinations examined of 40/30 and 20/15 dB/Bark. The best results found for the steepest slopes do not seem to agree with current estimates of the spread of masking in electrical stimulation. Future research might reveal if performance with respect to SPEAK/ACE can be enhanced by increasing the number of channels in MP3000 beyond 4-6 and it should shed more light on the optimum steepness of the slopes of the masking functions applied in MP3000.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas
In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less
Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas; ...
2016-01-06
In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less
[A research in speech endpoint detection based on boxes-coupling generalization dimension].
Wang, Zimei; Yang, Cuirong; Wu, Wei; Fan, Yingle
2008-06-01
In this paper, a new calculating method of generalized dimension, based on boxes-coupling principle, is proposed to overcome the edge effects and to improve the capability of the speech endpoint detection which is based on the original calculating method of generalized dimension. This new method has been applied to speech endpoint detection. Firstly, the length of overlapping border was determined, and through calculating the generalized dimension by covering the speech signal with overlapped boxes, three-dimension feature vectors including the box dimension, the information dimension and the correlation dimension were obtained. Secondly, in the light of the relation between feature distance and similarity degree, feature extraction was conducted by use of common distance. Lastly, bi-threshold method was used to classify the speech signals. The results of experiment indicated that, by comparison with the original generalized dimension (OGD) and the spectral entropy (SE) algorithm, the proposed method is more robust and effective for detecting the speech signals which contain different kinds of noise in different signal noise ratio (SNR), especially in low SNR.
Audio visual speech source separation via improved context dependent association model
NASA Astrophysics Data System (ADS)
Kazemi, Alireza; Boostani, Reza; Sobhanmanesh, Fariborz
2014-12-01
In this paper, we exploit the non-linear relation between a speech source and its associated lip video as a source of extra information to propose an improved audio-visual speech source separation (AVSS) algorithm. The audio-visual association is modeled using a neural associator which estimates the visual lip parameters from a temporal context of acoustic observation frames. We define an objective function based on mean square error (MSE) measure between estimated and target visual parameters. This function is minimized for estimation of the de-mixing vector/filters to separate the relevant source from linear instantaneous or time-domain convolutive mixtures. We have also proposed a hybrid criterion which uses AV coherency together with kurtosis as a non-Gaussianity measure. Experimental results are presented and compared in terms of visually relevant speech detection accuracy and output signal-to-interference ratio (SIR) of source separation. The suggested audio-visual model significantly improves relevant speech classification accuracy compared to existing GMM-based model and the proposed AVSS algorithm improves the speech separation quality compared to reference ICA- and AVSS-based methods.
Eustaquio-Martín, Almudena; Stohl, Joshua S.; Wolford, Robert D.; Schatzer, Reinhold; Wilson, Blake S.
2016-01-01
Objectives: In natural hearing, cochlear mechanical compression is dynamically adjusted via the efferent medial olivocochlear reflex (MOCR). These adjustments probably help understanding speech in noisy environments and are not available to the users of current cochlear implants (CIs). The aims of the present study are to: (1) present a binaural CI sound processing strategy inspired by the control of cochlear compression provided by the contralateral MOCR in natural hearing; and (2) assess the benefits of the new strategy for understanding speech presented in competition with steady noise with a speech-like spectrum in various spatial configurations of the speech and noise sources. Design: Pairs of CI sound processors (one per ear) were constructed to mimic or not mimic the effects of the contralateral MOCR on compression. For the nonmimicking condition (standard strategy or STD), the two processors in a pair functioned similarly to standard clinical processors (i.e., with fixed back-end compression and independently of each other). When configured to mimic the effects of the MOCR (MOC strategy), the two processors communicated with each other and the amount of back-end compression in a given frequency channel of each processor in the pair decreased/increased dynamically (so that output levels dropped/increased) with increases/decreases in the output energy from the corresponding frequency channel in the contralateral processor. Speech reception thresholds in speech-shaped noise were measured for 3 bilateral CI users and 2 single-sided deaf unilateral CI users. Thresholds were compared for the STD and MOC strategies in unilateral and bilateral listening conditions and for three spatial configurations of the speech and noise sources in simulated free-field conditions: speech and noise sources colocated in front of the listener, speech on the left ear with noise in front of the listener, and speech on the left ear with noise on the right ear. In both bilateral and unilateral listening, the electrical stimulus delivered to the test ear(s) was always calculated as if the listeners were wearing bilateral processors. Results: In both unilateral and bilateral listening conditions, mean speech reception thresholds were comparable with the two strategies for colocated speech and noise sources, but were at least 2 dB lower (better) with the MOC than with the STD strategy for spatially separated speech and noise sources. In unilateral listening conditions, mean thresholds improved with increasing the spatial separation between the speech and noise sources regardless of the strategy but the improvement was significantly greater with the MOC strategy. In bilateral listening conditions, thresholds improved significantly with increasing the speech-noise spatial separation only with the MOC strategy. Conclusions: The MOC strategy (1) significantly improved the intelligibility of speech presented in competition with a spatially separated noise source, both in unilateral and bilateral listening conditions; (2) produced significant spatial release from masking in bilateral listening conditions, something that did not occur with fixed compression; and (3) enhanced spatial release from masking in unilateral listening conditions. The MOC strategy as implemented here, or a modified version of it, may be usefully applied in CIs and in hearing aids. PMID:26862711
Lopez-Poveda, Enrique A; Eustaquio-Martín, Almudena; Stohl, Joshua S; Wolford, Robert D; Schatzer, Reinhold; Wilson, Blake S
2016-01-01
In natural hearing, cochlear mechanical compression is dynamically adjusted via the efferent medial olivocochlear reflex (MOCR). These adjustments probably help understanding speech in noisy environments and are not available to the users of current cochlear implants (CIs). The aims of the present study are to: (1) present a binaural CI sound processing strategy inspired by the control of cochlear compression provided by the contralateral MOCR in natural hearing; and (2) assess the benefits of the new strategy for understanding speech presented in competition with steady noise with a speech-like spectrum in various spatial configurations of the speech and noise sources. Pairs of CI sound processors (one per ear) were constructed to mimic or not mimic the effects of the contralateral MOCR on compression. For the nonmimicking condition (standard strategy or STD), the two processors in a pair functioned similarly to standard clinical processors (i.e., with fixed back-end compression and independently of each other). When configured to mimic the effects of the MOCR (MOC strategy), the two processors communicated with each other and the amount of back-end compression in a given frequency channel of each processor in the pair decreased/increased dynamically (so that output levels dropped/increased) with increases/decreases in the output energy from the corresponding frequency channel in the contralateral processor. Speech reception thresholds in speech-shaped noise were measured for 3 bilateral CI users and 2 single-sided deaf unilateral CI users. Thresholds were compared for the STD and MOC strategies in unilateral and bilateral listening conditions and for three spatial configurations of the speech and noise sources in simulated free-field conditions: speech and noise sources colocated in front of the listener, speech on the left ear with noise in front of the listener, and speech on the left ear with noise on the right ear. In both bilateral and unilateral listening, the electrical stimulus delivered to the test ear(s) was always calculated as if the listeners were wearing bilateral processors. In both unilateral and bilateral listening conditions, mean speech reception thresholds were comparable with the two strategies for colocated speech and noise sources, but were at least 2 dB lower (better) with the MOC than with the STD strategy for spatially separated speech and noise sources. In unilateral listening conditions, mean thresholds improved with increasing the spatial separation between the speech and noise sources regardless of the strategy but the improvement was significantly greater with the MOC strategy. In bilateral listening conditions, thresholds improved significantly with increasing the speech-noise spatial separation only with the MOC strategy. The MOC strategy (1) significantly improved the intelligibility of speech presented in competition with a spatially separated noise source, both in unilateral and bilateral listening conditions; (2) produced significant spatial release from masking in bilateral listening conditions, something that did not occur with fixed compression; and (3) enhanced spatial release from masking in unilateral listening conditions. The MOC strategy as implemented here, or a modified version of it, may be usefully applied in CIs and in hearing aids.
Robust estimators for speech enhancement in real environments
NASA Astrophysics Data System (ADS)
Sandoval-Ibarra, Yuma; Diaz-Ramirez, Victor H.; Kober, Vitaly
2015-09-01
Common statistical estimators for speech enhancement rely on several assumptions about stationarity of speech signals and noise. These assumptions may not always valid in real-life due to nonstationary characteristics of speech and noise processes. We propose new estimators based on existing estimators by incorporation of computation of rank-order statistics. The proposed estimators are better adapted to non-stationary characteristics of speech signals and noise processes. Through computer simulations we show that the proposed estimators yield a better performance in terms of objective metrics than that of known estimators when speech signals are contaminated with airport, babble, restaurant, and train-station noise.
Development and Perceptual Evaluation of Amplitude-Based F0 Control in Electrolarynx Speech
ERIC Educational Resources Information Center
Saikachi, Yoko; Stevens, Kenneth N.; Hillman, Robert E.
2009-01-01
Purpose: Current electrolarynx (EL) devices produce a mechanical speech quality that has been largely attributed to the lack of natural fundamental frequency (F0) variation. In order to improve the quality of EL speech, in the present study the authors aimed to develop and evaluate an automatic F0 control scheme, in which F0 was modulated based on…
Constructing Adequate Non-Speech Analogues: What Is Special about Speech Anyway?
ERIC Educational Resources Information Center
Rosen, Stuart; Iverson, Paul
2007-01-01
Vouloumanos and Werker (2007) claim that human neonates have a (possibly innate) bias to listen to speech based on a preference for natural speech utterances over sine-wave analogues. We argue that this bias more likely arises from the strikingly different saliency of voice melody in the two kinds of sounds, a bias that has already been shown to…
ERIC Educational Resources Information Center
Ashtiani, Farshid Tayari; Zafarghandi, Amir Mahdavi
2015-01-01
The present study was an attempt to investigate the impact of English verbal songs on connected speech aspects of adult English learners' speech production. 40 participants were selected based on the results of their performance in a piloted and validated version of NELSON test given to 60 intermediate English learners in a language institute in…
ERIC Educational Resources Information Center
Koehlinger, Keegan M.
2015-01-01
Clinical Question: Would a preschool-aged child with childhood apraxia of speech (CAS) benefit from a singular approach--such as motor planning, sensory cueing, linguistic and rhythmic--or a combined approach in order to increase intelligibility of spoken language? Method: Systematic Review. Study Sources: ASHA Wire, Google Scholar, Speech Bite.…
Everyday listeners' impressions of speech produced by individuals with adductor spasmodic dysphonia.
Nagle, Kathleen F; Eadie, Tanya L; Yorkston, Kathryn M
2015-01-01
Individuals with adductor spasmodic dysphonia (ADSD) have reported that unfamiliar communication partners appear to judge them as sneaky, nervous or not intelligent, apparently based on the quality of their speech; however, there is minimal research into the actual everyday perspective of listening to ADSD speech. The purpose of this study was to investigate the impressions of listeners hearing ADSD speech for the first time using a mixed-methods design. Everyday listeners were interviewed following sessions in which they made ratings of ADSD speech. A semi-structured interview approach was used and data were analyzed using thematic content analysis. Three major themes emerged: (1) everyday listeners make judgments about speakers with ADSD; (2) ADSD speech does not sound normal to everyday listeners; and (3) rating overall severity is difficult for everyday listeners. Participants described ADSD speech similarly to existing literature; however, some listeners inaccurately extrapolated speaker attributes based solely on speech samples. Listeners may draw erroneous conclusions about individuals with ADSD and these biases may affect the communicative success of these individuals. Results have implications for counseling individuals with ADSD, as well as the need for education and awareness about ADSD. Copyright © 2015 Elsevier Inc. All rights reserved.
Autonomic Correlates of Speech Versus Nonspeech Tasks in Children and Adults
Arnold, Hayley S.; MacPherson, Megan K.; Smith, Anne
2015-01-01
Purpose To assess autonomic arousal associated with speech and nonspeech tasks in school-age children and young adults. Method Measures of autonomic arousal (electrodermal level, electrodermal response amplitude, blood pulse volume, and heart rate) were recorded prior to, during, and after the performance of speech and nonspeech tasks by twenty 7- to 9-year-old children and twenty 18- to 22-year-old adults. Results Across age groups, autonomic arousal was higher for speech tasks compared with nonspeech tasks, based on peak electrodermal response amplitude and blood pulse volume. Children demonstrated greater relative arousal, based on heart rate and blood pulse volume, for nonspeech oral motor tasks than adults but showed similar mean arousal levels for speech tasks as adults. Children demonstrated sex differences in autonomic arousal; specifically, autonomic arousal remained high for school-age boys but not girls in a more complex open-ended narrative task that followed a simple sentence production task. Conclusions Speech tasks elicit greater autonomic arousal than nonspeech tasks, and children demonstrate greater autonomic arousal for nonspeech oral motor tasks than adults. Sex differences in autonomic arousal associated with speech tasks in school-age children are discussed relative to speech-language differences between boys and girls. PMID:24686989
Development of a Low-Cost, Noninvasive, Portable Visual Speech Recognition Program.
Kohlberg, Gavriel D; Gal, Ya'akov Kobi; Lalwani, Anil K
2016-09-01
Loss of speech following tracheostomy and laryngectomy severely limits communication to simple gestures and facial expressions that are largely ineffective. To facilitate communication in these patients, we seek to develop a low-cost, noninvasive, portable, and simple visual speech recognition program (VSRP) to convert articulatory facial movements into speech. A Microsoft Kinect-based VSRP was developed to capture spatial coordinates of lip movements and translate them into speech. The articulatory speech movements associated with 12 sentences were used to train an artificial neural network classifier. The accuracy of the classifier was then evaluated on a separate, previously unseen set of articulatory speech movements. The VSRP was successfully implemented and tested in 5 subjects. It achieved an accuracy rate of 77.2% (65.0%-87.6% for the 5 speakers) on a 12-sentence data set. The mean time to classify an individual sentence was 2.03 milliseconds (1.91-2.16). We have demonstrated the feasibility of a low-cost, noninvasive, portable VSRP based on Kinect to accurately predict speech from articulation movements in clinically trivial time. This VSRP could be used as a novel communication device for aphonic patients. © The Author(s) 2016.
Simulation of talking faces in the human brain improves auditory speech recognition
von Kriegstein, Katharina; Dogan, Özgür; Grüter, Martina; Giraud, Anne-Lise; Kell, Christian A.; Grüter, Thomas; Kleinschmidt, Andreas; Kiebel, Stefan J.
2008-01-01
Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face. PMID:18436648
NASA Technical Reports Server (NTRS)
Sternfeld, H., Jr.; Doyle, L. B.
1978-01-01
The relationship between the internal noise environment of helicopters and the ability of personnel to understand commands and instructions was studied. A test program was conducted to relate speech intelligibility to a standard measurement called Articulation Index. An acoustical simulator was used to provide noise environments typical of Army helicopters. Speech material (command sentences and phonetically balanced word lists) were presented at several voice levels in each helicopter environment. Recommended helicopter internal noise criteria, based on speech communication, were derived and the effectiveness of hearing protection devices were evaluated.
Designing speech-based interfaces for telepresence robots for people with disabilities.
Tsui, Katherine M; Flynn, Kelsey; McHugh, Amelia; Yanco, Holly A; Kontak, David
2013-06-01
People with cognitive and/or motor impairments may benefit from using telepresence robots to engage in social activities. To date, these robots, their user interfaces, and their navigation behaviors have not been designed for operation by people with disabilities. We conducted an experiment in which participants (n=12) used a telepresence robot in a scavenger hunt task to determine how they would use speech to command the robot. Based upon the results, we present design guidelines for speech-based interfaces for telepresence robots.
Justice, Laura M; Schmitt, Mary Beth; Murphy, Kimberly A; Pratt, Amy; Biancone, Tricia
2014-01-01
This study examined vocabulary intervention-in terms of targets and techniques-for children with language impairment receiving speech-language therapy in public schools (i.e., non-fee-paying schools) in the United States. Vocabulary treatments and targets were examined with respect to their alignment with the empirically validated practice of rich vocabulary intervention. Participants were forty-eight 5-7-year-old children participating in kindergarten or the first-grade year of school, all of whom had vocabulary-specific goals on their individualized education programmes. Two therapy sessions per child were coded to determine what vocabulary words were being directly targeted and what techniques were used for each. Study findings showed that the majority of words directly targeted during therapy were lower-level basic vocabulary words (87%) and very few (1%) were academically relevant. On average, three techniques were used per word to promote deep understanding. Interpreting findings against empirical descriptions of rich vocabulary intervention indicates that children were exposed to some but not all aspects of this empirically supported practice. © 2013 Royal College of Speech and Language Therapists.
Kaplan, Peter S; Danko, Christina M; Cejka, Anna M; Everhart, Kevin D
2015-11-01
The hypothesis that the associative learning-promoting effects of infant-directed speech (IDS) depend on infants' social experience was tested in a conditioned-attention paradigm with a cumulative sample of 4- to 14-month-old infants. Following six forward pairings of a brief IDS segment and a photographic slide of a smiling female face, infants of clinically depressed mothers exhibited evidence of having acquired significantly weaker voice-face associations than infants of non-depressed mothers. Regression analyses revealed that maternal depression was significantly related to infant learning even after demographic correlates of depression, antidepressant medication use, and extent of pitch modulation in maternal IDS had been taken into account. However, after maternal depression had been accounted for, maternal emotional availability, coded by blind raters from separate play interactions, accounted for significant further increments in the proportion of variance accounted for in infant learning scores. Both maternal depression and maternal insensitivity negatively, and additively, predicted poor learning. Copyright © 2015 Elsevier Inc. All rights reserved.
An integrated approach to improving noisy speech perception
NASA Astrophysics Data System (ADS)
Koval, Serguei; Stolbov, Mikhail; Smirnova, Natalia; Khitrov, Mikhail
2002-05-01
For a number of practical purposes and tasks, experts have to decode speech recordings of very poor quality. A combination of techniques is proposed to improve intelligibility and quality of distorted speech messages and thus facilitate their comprehension. Along with the application of noise cancellation and speech signal enhancement techniques removing and/or reducing various kinds of distortions and interference (primarily unmasking and normalization in time and frequency fields), the approach incorporates optimal listener expert tactics based on selective listening, nonstandard binaural listening, accounting for short-term and long-term human ear adaptation to noisy speech, as well as some methods of speech signal enhancement to support speech decoding during listening. The approach integrating the suggested techniques ensures high-quality ultimate results and has successfully been applied by Speech Technology Center experts and by numerous other users, mainly forensic institutions, to perform noisy speech records decoding for courts, law enforcement and emergency services, accident investigation bodies, etc.
Asad, Areej Nimer; Purdy, Suzanne C; Ballard, Elaine; Fairgray, Liz; Bowen, Caroline
2018-04-27
In this descriptive study, phonological processes were examined in the speech of children aged 5;0-7;6 (years; months) with mild to profound hearing loss using hearing aids (HAs) and cochlear implants (CIs), in comparison to their peers. A second aim was to compare phonological processes of HA and CI users. Children with hearing loss (CWHL, N = 25) were compared to children with normal hearing (CWNH, N = 30) with similar age, gender, linguistic, and socioeconomic backgrounds. Speech samples obtained from a list of 88 words, derived from three standardized speech tests, were analyzed using the CASALA (Computer Aided Speech and Language Analysis) program to evaluate participants' phonological systems, based on lax (a process appeared at least twice in the speech of at least two children) and strict (a process appeared at least five times in the speech of at least two children) counting criteria. Developmental phonological processes were eliminated in the speech of younger and older CWNH while eleven developmental phonological processes persisted in the speech of both age groups of CWHL. CWHL showed a similar trend of age of elimination to CWNH, but at a slower rate. Children with HAs and CIs produced similar phonological processes. Final consonant deletion, weak syllable deletion, backing, and glottal replacement were present in the speech of HA users, affecting their overall speech intelligibility. Developmental and non-developmental phonological processes persist in the speech of children with mild to profound hearing loss compared to their peers with typical hearing. The findings indicate that it is important for clinicians to consider phonological assessment in pre-school CWHL and the use of evidence-based speech therapy in order to reduce non-developmental and non-age-appropriate developmental processes, thereby enhancing their speech intelligibility. Copyright © 2018 Elsevier Inc. All rights reserved.
Speech-Like Rhythm in a Voiced and Voiceless Orangutan Call
Lameira, Adriano R.; Hardus, Madeleine E.; Bartlett, Adrian M.; Shumaker, Robert W.; Wich, Serge A.; Menken, Steph B. J.
2015-01-01
The evolutionary origins of speech remain obscure. Recently, it was proposed that speech derived from monkey facial signals which exhibit a speech-like rhythm of ∼5 open-close lip cycles per second. In monkeys, these signals may also be vocalized, offering a plausible evolutionary stepping stone towards speech. Three essential predictions remain, however, to be tested to assess this hypothesis' validity; (i) Great apes, our closest relatives, should likewise produce 5Hz-rhythm signals, (ii) speech-like rhythm should involve calls articulatorily similar to consonants and vowels given that speech rhythm is the direct product of stringing together these two basic elements, and (iii) speech-like rhythm should be experience-based. Via cinematic analyses we demonstrate that an ex-entertainment orangutan produces two calls at a speech-like rhythm, coined “clicks” and “faux-speech.” Like voiceless consonants, clicks required no vocal fold action, but did involve independent manoeuvring over lips and tongue. In parallel to vowels, faux-speech showed harmonic and formant modulations, implying vocal fold and supralaryngeal action. This rhythm was several times faster than orangutan chewing rates, as observed in monkeys and humans. Critically, this rhythm was seven-fold faster, and contextually distinct, than any other known rhythmic calls described to date in the largest database of the orangutan repertoire ever assembled. The first two predictions advanced by this study are validated and, based on parsimony and exclusion of potential alternative explanations, initial support is given to the third prediction. Irrespectively of the putative origins of these calls and underlying mechanisms, our findings demonstrate irrevocably that great apes are not respiratorily, articulatorilly, or neurologically constrained for the production of consonant- and vowel-like calls at speech rhythm. Orangutan clicks and faux-speech confirm the importance of rhythmic speech antecedents within the primate lineage, and highlight potential articulatory homologies between great ape calls and human consonants and vowels. PMID:25569211
A Window into the Intoxicated Mind? Speech as an Index of Psychoactive Drug Effects
Bedi, Gillinder; Cecchi, Guillermo A; Slezak, Diego F; Carrillo, Facundo; Sigman, Mariano; de Wit, Harriet
2014-01-01
Abused drugs can profoundly alter mental states in ways that may motivate drug use. These effects are usually assessed with self-report, an approach that is vulnerable to biases. Analyzing speech during intoxication may present a more direct, objective measure, offering a unique ‘window' into the mind. Here, we employed computational analyses of speech semantic and topological structure after ±3,4-methylenedioxymethamphetamine (MDMA; ‘ecstasy') and methamphetamine in 13 ecstasy users. In 4 sessions, participants completed a 10-min speech task after MDMA (0.75 and 1.5 mg/kg), methamphetamine (20 mg), or placebo. Latent Semantic Analyses identified the semantic proximity between speech content and concepts relevant to drug effects. Graph-based analyses identified topological speech characteristics. Group-level drug effects on semantic distances and topology were assessed. Machine-learning analyses (with leave-one-out cross-validation) assessed whether speech characteristics could predict drug condition in the individual subject. Speech after MDMA (1.5 mg/kg) had greater semantic proximity than placebo to the concepts friend, support, intimacy, and rapport. Speech on MDMA (0.75 mg/kg) had greater proximity to empathy than placebo. Conversely, speech on methamphetamine was further from compassion than placebo. Classifiers discriminated between MDMA (1.5 mg/kg) and placebo with 88% accuracy, and MDMA (1.5 mg/kg) and methamphetamine with 84% accuracy. For the two MDMA doses, the classifier performed at chance. These data suggest that automated semantic speech analyses can capture subtle alterations in mental state, accurately discriminating between drugs. The findings also illustrate the potential for automated speech-based approaches to characterize clinically relevant alterations to mental state, including those occurring in psychiatric illness. PMID:24694926
A window into the intoxicated mind? Speech as an index of psychoactive drug effects.
Bedi, Gillinder; Cecchi, Guillermo A; Slezak, Diego F; Carrillo, Facundo; Sigman, Mariano; de Wit, Harriet
2014-09-01
Abused drugs can profoundly alter mental states in ways that may motivate drug use. These effects are usually assessed with self-report, an approach that is vulnerable to biases. Analyzing speech during intoxication may present a more direct, objective measure, offering a unique 'window' into the mind. Here, we employed computational analyses of speech semantic and topological structure after ±3,4-methylenedioxymethamphetamine (MDMA; 'ecstasy') and methamphetamine in 13 ecstasy users. In 4 sessions, participants completed a 10-min speech task after MDMA (0.75 and 1.5 mg/kg), methamphetamine (20 mg), or placebo. Latent Semantic Analyses identified the semantic proximity between speech content and concepts relevant to drug effects. Graph-based analyses identified topological speech characteristics. Group-level drug effects on semantic distances and topology were assessed. Machine-learning analyses (with leave-one-out cross-validation) assessed whether speech characteristics could predict drug condition in the individual subject. Speech after MDMA (1.5 mg/kg) had greater semantic proximity than placebo to the concepts friend, support, intimacy, and rapport. Speech on MDMA (0.75 mg/kg) had greater proximity to empathy than placebo. Conversely, speech on methamphetamine was further from compassion than placebo. Classifiers discriminated between MDMA (1.5 mg/kg) and placebo with 88% accuracy, and MDMA (1.5 mg/kg) and methamphetamine with 84% accuracy. For the two MDMA doses, the classifier performed at chance. These data suggest that automated semantic speech analyses can capture subtle alterations in mental state, accurately discriminating between drugs. The findings also illustrate the potential for automated speech-based approaches to characterize clinically relevant alterations to mental state, including those occurring in psychiatric illness.
The Role of Visual Image and Perception in Speech Development of Children with Speech Pathology
ERIC Educational Resources Information Center
Tsvetkova, L. S.; Kuznetsova, T. M.
1977-01-01
Investigated with 125 children (4-14 years old) with speech, language, or emotional disorders was the assumption that the naming function can be underdeveloped because of defects in the word's gnostic base. (Author/DB)
Routine Language: Speech Directed to Infants During Home Activities.
Tamis-LeMonda, Catherine S; Custode, Stephanie; Kuchirko, Yana; Escobar, Kelly; Lo, Tiffany
2018-05-15
Everyday activities are replete with contextual cues for infants to exploit in the service of learning words. Nelson's (1985) script theory guided the hypothesis that infants participate in a set of predictable activities over the course of a day that provide them with opportunities to hear unique language functions and forms. Mothers and their firstborn 13-month-old infants (N = 40) were video-recorded during everyday activities at home. Transcriptions and coding of mothers' speech to infants-time-locked to activities of feeding, grooming, booksharing, object play, and transition-revealed that the amount, diversity, pragmatic functions, and semantic content of maternal language systematically differed by activity. The activities of everyday life shape language inputs to infants in ways that highlight word meaning. © 2018 Society for Research in Child Development.
Speech deterioration in amyotrophic lateral sclerosis (ALS) after manifestation of bulbar symptoms.
Makkonen, Tanja; Ruottinen, Hanna; Puhto, Riitta; Helminen, Mika; Palmio, Johanna
2018-03-01
The symptoms and their progression in amyotrophic lateral sclerosis (ALS) are typically studied after the diagnosis has been confirmed. However, many people with ALS already have severe dysarthria and loss of adequate speech at the time of diagnosis. Speech-and-language therapy interventions should be targeted timely based on communicative need in ALS. To investigate how long natural speech will remain functional and to identify the changes in the speech of persons with ALS. Altogether 30 consecutive participants were studied and divided into two groups based on the initial type of ALS, bulbar or spinal. Their speech disorder was evaluated on severity, articulation rate and intelligibility during the 2-year follow-up. The ability to speak deteriorated to poor and necessitated augmentative and alternative communication (AAC) methods with 60% of the participants. Their speech remained adequate on average for 18 months from the first bulbar symptom. Severity, articulation rate and intelligibility declined with nearly all participants during the study. To begin with speech deteriorated more in the bulbar group than in the spinal group and the difference remained during the whole follow-up with some exceptions. The onset of bulbar symptoms indicated the time to loss of speech better than when assessed from ALS diagnosis or the first speech therapy evaluation. In clinical work, it is important to take the initial type of ALS into consideration when determining the urgency of AAC measures as people with bulbar-onset ALS are more susceptible to delayed evaluation and AAC intervention. © 2017 Royal College of Speech and Language Therapists.
Kim, Soo Ji; Jo, Uiri
2013-01-01
Based on the anatomical and functional commonality between singing and speech, various types of musical elements have been employed in music therapy research for speech rehabilitation. This study was to develop an accent-based music speech protocol to address voice problems of stroke patients with mixed dysarthria. Subjects were 6 stroke patients with mixed dysarthria and they received individual music therapy sessions. Each session was conducted for 30 minutes and 12 sessions including pre- and post-test were administered for each patient. For examining the protocol efficacy, the measures of maximum phonation time (MPT), fundamental frequency (F0), average intensity (dB), jitter, shimmer, noise to harmonics ratio (NHR), and diadochokinesis (DDK) were compared between pre and post-test and analyzed with a paired sample t-test. The results showed that the measures of MPT, F0, dB, and sequential motion rates (SMR) were significantly increased after administering the protocol. Also, there were statistically significant differences in the measures of shimmer, and alternating motion rates (AMR) of the syllable /K$\\inve$/ between pre- and post-test. The results indicated that the accent-based music speech protocol may improve speech motor coordination including respiration, phonation, articulation, resonance, and prosody of patients with dysarthria. This suggests the possibility of utilizing the music speech protocol to maximize immediate treatment effects in the course of a long-term treatment for patients with dysarthria.
Kraaijenga, S A C; Oskam, I M; van Son, R J J H; Hamming-Vrieze, O; Hilgers, F J M; van den Brekel, M W M; van der Molen, L
2016-04-01
Assessment of long-term objective and subjective voice, speech, articulation, and quality of life in patients with head and neck cancer (HNC) treated with concurrent chemoradiotherapy (CRT) for advanced, stage IV disease. Twenty-two disease-free survivors, treated with cisplatin-based CRT for inoperable HNC (1999-2004), were evaluated at 10-years post-treatment. A standard Dutch text was recorded. Perceptual analysis of voice, speech, and articulation was conducted by two expert listeners (SLPs). Also an experimental expert system based on automatic speech recognition was used. Patients' perception of voice and speech and related quality of life was assessed with the Voice Handicap Index (VHI) and Speech Handicap Index (SHI) questionnaires. At a median follow-up of 11-years, perceptual evaluation showed abnormal scores in up to 64% of cases, depending on the outcome parameter analyzed. Automatic assessment of voice and speech parameters correlated moderate to strong with perceptual outcome scores. Patient-reported problems with voice (VHI>15) and speech (SHI>6) in daily life were present in 68% and 77% of patients, respectively. Patients treated with IMRT showed significantly less impairment compared to those treated with conventional radiotherapy. More than 10-years after organ-preservation treatment, voice and speech problems are common in this patient cohort, as assessed with perceptual evaluation, automatic speech recognition, and with validated structured questionnaires. There were fewer complaints in patients treated with IMRT than with conventional radiotherapy. Copyright © 2016 Elsevier Ltd. All rights reserved.
Two different phenomena in basic motor speech performance in premanifest Huntington disease.
Skodda, Sabine; Grönheit, Wenke; Lukas, Carsten; Bellenberg, Barbara; von Hein, Sarah M; Hoffmann, Rainer; Saft, Carsten
2016-03-09
Dysarthria is a common feature in Huntington disease (HD). The aim of this cross-sectional pilot study was the description and objective analysis of different speech parameters with special emphasis on the aspect of speech timing of connected speech and nonspeech verbal utterances in premanifest HD (preHD). A total of 28 preHD mutation carriers and 28 age- and sex-matched healthy speakers had to perform a reading task and several syllable repetition tasks. Results of computerized acoustic analysis of different variables for the measurement of speech rate and regularity were correlated with clinical measures and MRI-based brain atrophy assessment by voxel-based morphometry. An impaired capacity to steadily repeat single syllables with higher variations in preHD compared to healthy controls was found (variance 1: Cohen d = 1.46). Notably, speech rate was increased compared to controls and showed correlations to the volume of certain brain areas known to be involved in the sensory-motor speech networks (net speech rate: Cohen d = 1.19). Furthermore, speech rate showed correlations to disease burden score, probability of disease onset, the estimated years to onset, and clinical measures like the cognitive score. Measurement of speech rate and regularity might be helpful additional tools for the monitoring of subclinical functional disability in preHD. As one of the possible causes for higher performance in preHD, we discuss huntingtin-dependent temporarily advantageous development processes of the brain. © 2016 American Academy of Neurology.
Accurate visible speech synthesis based on concatenating variable length motion capture data.
Ma, Jiyong; Cole, Ron; Pellom, Bryan; Ward, Wayne; Wise, Barbara
2006-01-01
We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is desrcribed. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergarten through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.
Speech recognition: Acoustic-phonetic knowledge acquisition and representation
NASA Astrophysics Data System (ADS)
Zue, Victor W.
1988-09-01
The long-term research goal is to develop and implement speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. This research is thus directed toward the acquisition, quantification, and representation, of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. In addition, we are exploring new speech recognition alternatives based on artificial intelligence and connectionist techniques. We developed a statistical model for predicting the acoustic realization of stop consonants in various positions in the syllable template. A unification-based grammatical formalism was developed for incorporating this model into the lexical access algorithm. We provided an information-theoretic justification for the hierarchical structure of the syllable template. We analyzed segmented duration for vowels and fricatives in continuous speech. Based on contextual information, we developed durational models for vowels and fricatives that account for over 70 percent of the variance, using data from multiple, unknown speakers. We rigorously evaluated the ability of human spectrogram readers to identify stop consonants spoken by many talkers and in a variety of phonetic contexts. Incorporating the declarative knowledge used by the readers, we developed a knowledge-based system for stop identification. We achieved comparable system performance to that to the readers.
Noise and pitch interact during the cortical segregation of concurrent speech.
Bidelman, Gavin M; Yellamsetty, Anusha
2017-08-01
Behavioral studies reveal listeners exploit intrinsic differences in voice fundamental frequency (F0) to segregate concurrent speech sounds-the so-called "F0-benefit." More favorable signal-to-noise ratio (SNR) in the environment, an extrinsic acoustic factor, similarly benefits the parsing of simultaneous speech. Here, we examined the neurobiological substrates of these two cues in the perceptual segregation of concurrent speech mixtures. We recorded event-related brain potentials (ERPs) while listeners performed a speeded double-vowel identification task. Listeners heard two concurrent vowels whose F0 differed by zero or four semitones presented in either clean (no noise) or noise-degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in correctly identifying both vowels for larger F0 separations but F0-benefit was more pronounced at more favorable SNRs (i.e., pitch × SNR interaction). Analysis of the ERPs revealed that only the P2 wave (∼200 ms) showed a similar F0 x SNR interaction as behavior and was correlated with listeners' perceptual F0-benefit. Neural classifiers applied to the ERPs further suggested that speech sounds are segregated neurally within 200 ms based on SNR whereas segregation based on pitch occurs later in time (400-700 ms). The earlier timing of extrinsic SNR compared to intrinsic F0-based segregation implies that the cortical extraction of speech from noise is more efficient than differentiating speech based on pitch cues alone, which may recruit additional cortical processes. Findings indicate that noise and pitch differences interact relatively early in cerebral cortex and that the brain arrives at the identities of concurrent speech mixtures as early as ∼200 ms. Copyright © 2017 Elsevier B.V. All rights reserved.
Auditory-Perceptual Learning Improves Speech Motor Adaptation in Children
Shiller, Douglas M.; Rochon, Marie-Lyne
2015-01-01
Auditory feedback plays an important role in children’s speech development by providing the child with information about speech outcomes that is used to learn and fine-tune speech motor plans. The use of auditory feedback in speech motor learning has been extensively studied in adults by examining oral motor responses to manipulations of auditory feedback during speech production. Children are also capable of adapting speech motor patterns to perceived changes in auditory feedback, however it is not known whether their capacity for motor learning is limited by immature auditory-perceptual abilities. Here, the link between speech perceptual ability and the capacity for motor learning was explored in two groups of 5–7-year-old children who underwent a period of auditory perceptual training followed by tests of speech motor adaptation to altered auditory feedback. One group received perceptual training on a speech acoustic property relevant to the motor task while a control group received perceptual training on an irrelevant speech contrast. Learned perceptual improvements led to an enhancement in speech motor adaptation (proportional to the perceptual change) only for the experimental group. The results indicate that children’s ability to perceive relevant speech acoustic properties has a direct influence on their capacity for sensory-based speech motor adaptation. PMID:24842067
NASA Astrophysics Data System (ADS)
Pishravian, Arash; Aghabozorgi Sahaf, Masoud Reza
2012-12-01
In this paper speech-music separation using Blind Source Separation is discussed. The separating algorithm is based on the mutual information minimization where the natural gradient algorithm is used for minimization. In order to do that, score function estimation from observation signals (combination of speech and music) samples is needed. The accuracy and the speed of the mentioned estimation will affect on the quality of the separated signals and the processing time of the algorithm. The score function estimation in the presented algorithm is based on Gaussian mixture based kernel density estimation method. The experimental results of the presented algorithm on the speech-music separation and comparing to the separating algorithm which is based on the Minimum Mean Square Error estimator, indicate that it can cause better performance and less processing time
Toward a Natural Speech Understanding System
1989-10-01
WALTER J. SENUS Technical Director Directorate of Intelligence & Reconnaissance FOR THE COMMANDER JAMES W. HYDE III V Directorate of Plans & Programs ...applicable) Human Resources Laboratory F30602-81-C-0193 8 . ADDRESS (City, State, and ZIP Code) 10. SOURCE OF FUNDING NUMBERS PROGRAM PROJECT TASK WORK...error rates for distinctive words produced in isolation by a single speaker, and their simple programming requirements. Template-matching systems rank
ERIC Educational Resources Information Center
Uno, Mariko
2017-01-01
The present dissertation extracted 17,291 questions from Aki, Ryo, and Tai and their mother's spontaneously produced speech data available in the CHILDES database (MacWhinney, 2000; Oshima-Takane & MacWhinney, 1998). The children's age ranged from 1;3 to 3;0. Their questions were coded for (1) yes/no questions that include a sentence-final…
ERIC Educational Resources Information Center
Torres, Mario S.; Qin, Lixia
2017-01-01
This study explored attitudes and perceptions of Chinese high school students regarding freedom of expression in their country. A survey capturing perceptions over various forms of free speech (e.g., student publication, dress code) was administered to a sample of 838, which included students from both urban and rural areas within Shaanxi Province…
Dimension-Based Statistical Learning Affects Both Speech Perception and Production
ERIC Educational Resources Information Center
Lehet, Matthew; Holt, Lori L.
2017-01-01
Multiple acoustic dimensions signal speech categories. However, dimensions vary in their informativeness; some are more diagnostic of category membership than others. Speech categorization reflects these dimensional regularities such that diagnostic dimensions carry more "perceptual weight" and more effectively signal category membership…
Performing speech recognition research with hypercard
NASA Technical Reports Server (NTRS)
Shepherd, Chip
1993-01-01
The purpose of this paper is to describe a HyperCard-based system for performing speech recognition research and to instruct Human Factors professionals on how to use the system to obtain detailed data about the user interface of a prototype speech recognition application.
Asynchronous sampling of speech with some vocoder experimental results
NASA Technical Reports Server (NTRS)
Babcock, M. L.
1972-01-01
The method of asynchronously sampling speech is based upon the derivatives of the acoustical speech signal. The following results are apparent from experiments to date: (1) It is possible to represent speech by a string of pulses of uniform amplitude, where the only information contained in the string is the spacing of the pulses in time; (2) the string of pulses may be produced in a simple analog manner; (3) the first derivative of the original speech waveform is the most important for the encoding process; (4) the resulting pulse train can be utilized to control an acoustical signal production system to regenerate the intelligence of the original speech.
Faulkner, Andrew; Rosen, Stuart; Green, Tim
2012-10-01
Two experimental groups were trained for 2 h with live or recorded speech that was noise-vocoded and spectrally shifted and was from the same text and talker. These two groups showed equivalent improvements in performance for vocoded and shifted sentences, and the group trained with recorded speech showed consistently greater improvements than untrained controls. Another group trained with unshifted noise-vocoded speech improved no more than untrained controls. Computer-based training thus appears at least as effective as labor-intensive live-voice training for improving the perception of spectrally shifted noise-vocoded speech, and by implication, for training of users of cochlear implants.
SAM: speech-aware applications in medicine to support structured data entry.
Wormek, A. K.; Ingenerf, J.; Orthner, H. F.
1997-01-01
In the last two years, improvement in speech recognition technology has directed the medical community's interest to porting and using such innovations in clinical systems. The acceptance of speech recognition systems in clinical domains increases with recognition speed, large medical vocabulary, high accuracy, continuous speech recognition, and speaker independence. Although some commercial speech engines approach these requirements, the greatest benefit can be achieved in adapting a speech recognizer to a specific medical application. The goals of our work are first, to develop a speech-aware core component which is able to establish connections to speech recognition engines of different vendors. This is realized in SAM. Second, with applications based on SAM we want to support the physician in his/her routine clinical care activities. Within the STAMP project (STAndardized Multimedia report generator in Pathology), we extend SAM by combining a structured data entry approach with speech recognition technology. Another speech-aware application in the field of Diabetes care is connected to a terminology server. The server delivers a controlled vocabulary which can be used for speech recognition. PMID:9357730
Audibility-based predictions of speech recognition for children and adults with normal hearing.
McCreery, Ryan W; Stelmachowicz, Patricia G
2011-12-01
This study investigated the relationship between audibility and predictions of speech recognition for children and adults with normal hearing. The Speech Intelligibility Index (SII) is used to quantify the audibility of speech signals and can be applied to transfer functions to predict speech recognition scores. Although the SII is used clinically with children, relatively few studies have evaluated SII predictions of children's speech recognition directly. Children have required more audibility than adults to reach maximum levels of speech understanding in previous studies. Furthermore, children may require greater bandwidth than adults for optimal speech understanding, which could influence frequency-importance functions used to calculate the SII. Speech recognition was measured for 116 children and 19 adults with normal hearing. Stimulus bandwidth and background noise level were varied systematically in order to evaluate speech recognition as predicted by the SII and derive frequency-importance functions for children and adults. Results suggested that children required greater audibility to reach the same level of speech understanding as adults. However, differences in performance between adults and children did not vary across frequency bands. © 2011 Acoustical Society of America