Science.gov

Sample records for concatenative speech synthesis

  1. [Visual synthesis of speech].

    PubMed

    Blanco, Y; Villanueva, A; Cabeza, R

    2000-01-01

    The eyes can come to be the sole tool of communication for highly disabled patients. With the appropriate technology it is possible to successfully interpret eye movements, increasing the possibilities of patient communication with the use of speech synthesisers. A system of these characteristics will have to include a speech synthesiser, an interface for the user to construct the text and a method of gaze interpretation. In this way a situation will be achieved in which the user will manage the system solely with his eyes. This review sets out the state of the art of the three modules that make up a system of this type, and finally it introduces the speech synthesis system (Síntesis Visual del Habla [SiVHa]), which is being developed in the Public University of Navarra.

  2. Speech Compression and Synthesis

    DTIC Science & Technology

    1980-10-01

    from Block 19: speech recognition, pnoneme recogmtion. initial design for a phonetic recognition program. We also recorded ana partially labeled a...track for two halves of a long vowel phoneme reconstructed from two diphone templates. The dotted line indicates where the templates meet...more accurately by compensating for the spectral errors in the LPC spectrum at the pitch harmonics. d) Designed and implemented a phonetic

  3. CString Concatenation

    DTIC Science & Technology

    2015-09-01

    esi,esp 00E6E1BF call __RTC_CheckEsp (0E61843h) 49 instructions The += concatenate created 27 lines of machine code versus the 49... lines of machine code generated by the + operator. So just by the number of instructions created, it can be seen that the + operator will take

  4. Fifty years of progress in speech synthesis

    NASA Astrophysics Data System (ADS)

    Schroeter, Juergen

    2004-10-01

    A common opinion is that progress in speech synthesis should be easier to discern than in other areas of speech communication: you just have to listen to the speech! Unfortunately, things are more complicated. It can be said, however, that early speech synthesis efforts were primarily concerned with providing intelligible speech, while, more recently, ``naturalness'' has been the focus. The field had its ``electronic'' roots in Homer Dudley's 1939 ``Voder,'' and it advanced in the 1950s and 1960s through progress in a number of labs including JSRU in England, Haskins Labs in the U.S., and Fant's Lab in Sweden. In the 1970s and 1980s significant progress came from efforts at Bell Labs (under Jim Flanagan's leadership) and at MIT (where Dennis Klatt created one of the first commercially viable systems). Finally, over the past 15 years, the methods of unit-selection synthesis were devised, primarily at ATR in Japan, and were advanced by work at AT&T Labs, Univ. of Edinburgh, and ATR. Today, TTS systems are able to ``convince some of the listeners some of the time'' that synthetic speech is as natural as live recordings. Ongoing efforts aim at replacing ``some'' with ``most'' for a wide range of real-world applications.

  5. Vocoders and Speech Perception: Uses of Computer-Based Speech Analysis-Synthesis in Stimulus Generation.

    ERIC Educational Resources Information Center

    Tierney, Joseph; Mack, Molly

    1987-01-01

    Stimuli used in research on the perception of the speech signal have often been obtained from simple filtering and distortion of the speech waveform, sometimes accompanied by noise. However, for more complex stimulus generation, the parameters of speech can be manipulated, after analysis and before synthesis, using various types of algorithms to…

  6. Preliminary Analysis of Automatic Speech Recognition and Synthesis Technology.

    DTIC Science & Technology

    1983-05-01

    Development 4 Sponsoring Agency Code Washington, D.C. 20593 G-DMT-3 33. Supplementory Notes The Contracting Officer’s Technical Representative for this...descriptions of many commercially available devices. We do not recommend that the Coast Guard pursue the development of a speech recognition system at...SYNTHESIS TECHNIQUES ........................ 130 6.1 SPEECH RECOGNITION TECHNOLOGY....................... 130 6.2 SPEECH SYNTHESIS TECHNOLOGY

  7. Infants' brain responses to speech suggest analysis by synthesis.

    PubMed

    Kuhl, Patricia K; Ramírez, Rey R; Bosseler, Alexis; Lin, Jo-Fu Lotus; Imada, Toshiaki

    2014-08-05

    Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners' knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca's area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of "motherese" on early language learning, and (iii) the "social-gating" hypothesis and humans' development of social understanding.

  8. Infants’ brain responses to speech suggest Analysis by Synthesis

    PubMed Central

    Kuhl, Patricia K.; Ramírez, Rey R.; Bosseler, Alexis; Lin, Jo-Fu Lotus; Imada, Toshiaki

    2014-01-01

    Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners’ knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca’s area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of “motherese” on early language learning, and (iii) the “social-gating” hypothesis and humans’ development of social understanding. PMID:25024207

  9. Auto Spell Suggestion for High Quality Speech Synthesis in Hindi

    NASA Astrophysics Data System (ADS)

    Kabra, Shikha; Agarwal, Ritika

    2014-02-01

    The goal of Text-to-Speech (TTS) synthesis in a particular language is to convert arbitrary input text to intelligible and natural sounding speech. However, for a particular language like Hindi, which is a highly confusing language (due to very close spellings), it is not an easy task to identify errors/mistakes in input text and an incorrect text degrade the quality of output speech hence this paper is a contribution to the development of high quality speech synthesis with the involvement of Spellchecker which generates spell suggestions for misspelled words automatically. Involvement of spellchecker would increase the efficiency of speech synthesis by providing spell suggestions for incorrect input text. Furthermore, we have provided the comparative study for evaluating the resultant effect on to phonetic text by adding spellchecker on to input text.

  10. Synthesis of IB-01211, a cyclic peptide containing 2,4-concatenated thia- and oxazoles, via Hantzsch macrocyclization.

    PubMed

    Hernández, Delia; Vilar, Gemma; Riego, Estela; Cañedo, Librada M; Cuevas, Carmen; Albericio, Fernando; Alvarez, Mercedes

    2007-03-01

    [structure: see text] An efficient and versatile convergent synthesis of IB-01211 based on a combination of peptide and heterocyclic chemistry is described. The key step in the synthesis is macrocyclization through intramolecular Hantzsch formation of the thiazole ring. Dehydration of a free primary alcohol to furnish the exocyclic methylidene present in the natural product was applied during the macrocyclization.

  11. Voice Quality Modelling for Expressive Speech Synthesis

    PubMed Central

    Socoró, Joan Claudi

    2014-01-01

    This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics. PMID:24587738

  12. Speech Analysis/Synthesis Based on Perception.

    DTIC Science & Technology

    1984-11-05

    unlimited. DTIC" S EL. _ KAR 15 19%5 A LEXINGTON MASSACHUSETTS .- i~i! ABSTRACT r,’ ...A speech analysis system based on a combination of physiological ...AUDITORY MODEL BASED ON PHYSIOLOGICAL RESULTSL................................................. 8 2.3 A SIMPLIFIED AUDITORY MODEL INCORPORATING... physiological studies of the auditory system are applied, it may be possible to design improved ASR machines. When applying auditory system results to the

  13. Alternative Speech Communication System for Persons with Severe Speech Disorders

    NASA Astrophysics Data System (ADS)

    Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

    2009-12-01

    Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.

  14. Towards personalized speech synthesis for augmentative and alternative communication.

    PubMed

    Mills, Timothy; Bunnell, H Timothy; Patel, Rupal

    2014-09-01

    Text-to-speech options on augmentative and alternative communication (AAC) devices are limited. Often, several individuals in a group setting use the same synthetic voice. This lack of customization may limit technology adoption and social integration. This paper describes our efforts to generate personalized synthesis for users with profoundly limited speech motor control. Existing voice banking and voice conversion techniques rely on recordings of clearly articulated speech from the target talker, which cannot be obtained from this population. Our VocaliD approach extracts prosodic properties from the target talker's source function and applies these features to a surrogate talker's database, generating a synthetic voice with the vocal identity of the target talker and the clarity of the surrogate talker. Promising intelligibility results suggest areas of further development for improved personalization.

  15. How Foreign are ’Foreign’ Speech Sounds? Implications for Speech Recognition and Speech Synthesis

    DTIC Science & Technology

    2000-08-01

    Language Speech Sounds. In James, A. & J. Leather (eds.). Sound Patterns in Second Language Acquisition , Foris Publications. ... language acquisition (SLA) research. The This paper reports results from a production study which phonological processes involved when approaching a...se *Telia Research AB, Room B324, S-12386 Farsta, Sweden ABSTRACT field of phonological acquisition, and more specifically, into the field of second

  16. Towards direct speech synthesis from ECoG: A pilot study.

    PubMed

    Herff, Christian; Johnson, Garett; Diener, Lorenz; Shih, Jerry; Krusienski, Dean; Schultz, Tanja

    2016-08-01

    Most current Brain-Computer Interfaces (BCIs) achieve high information transfer rates using spelling paradigms based on stimulus-evoked potentials. Despite the success of this interfaces, this mode of communication can be cumbersome and unnatural. Direct synthesis of speech from neural activity represents a more natural mode of communication that would enable users to convey verbal messages in real-time. In this pilot study with one participant, we demonstrate that electrocoticography (ECoG) intracranial activity from temporal areas can be used to resynthesize speech in real-time. This is accomplished by reconstructing the audio magnitude spectrogram from neural activity and subsequently creating the audio waveform from these reconstructed spectrograms. We show that significant correlations between the original and reconstructed spectrograms and temporal waveforms can be achieved. While this pilot study uses audibly spoken speech for the models, it represents a first step towards speech synthesis from speech imagery.

  17. Radio Losses for Concatenated Codes

    NASA Astrophysics Data System (ADS)

    Shambayati, S.

    2002-07-01

    The advent of higher powered spacecraft amplifiers and better ground receivers capable of tracking spacecraft carrier signals with narrower loop bandwidths requires better understanding of the carrier tracking loss (radio loss) mechanism of the concatenated codes used for deep-space missions. In this article, we present results of simulations performed for a (7,1/2), Reed-Solomon (255,223), interleaver depth-5 concatenated code in order to shed some light on this issue. Through these simulations, we obtained the performance of this code over an additive white Gaussian noise (AWGN) channel (the baseline performance) in terms of both its frame-error rate (FER) and its bit-error rate at the output of the Reed-Solomon decoder (RS-BER). After obtaining these results, we curve fitted the baseline performance curves for FER and RS-BER and calculated the high-rate radio losses for this code for an FER of 10^(-4) and its corresponding baseline RS-BER of 2.1 x 10^(-6) for a carrier loop signal-to-noise ratio (SNR) of 14.8 dB. This calculation revealed that even though over the AWGN channel the FER value and the RS-BER value correspond to each other (i.e., these values are obtained by the same bit SNR value), the RS-BER value has higher high-rate losses than does the FER value. Furthermore, this calculation contradicted the previous assumption th at at high data rates concatenated codes have the same radio losses as their constituent convolutional codes. Our results showed much higher losses for the FER and the RS-BER (by as much as 2 dB) than for the corresponding baseline BER of the convolutional code. Further simulations were performed to investigate the effects of changes in the data rate on the code's radio losses. It was observed that as the data rate increased the radio losses for both the FER and the RS-BER approached their respective calculated high-rate values. Furthermore, these simulations showed that a simple two-parameter function could model the increase in the

  18. Design and performance of an analysis-by-synthesis class of predictive speech coders

    NASA Technical Reports Server (NTRS)

    Rose, Richard C.; Barnwell, Thomas P., III

    1990-01-01

    The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally. The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied. A general formulation for deriving the parametric representation used in all of the coders in the class is presented. A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general. The results of a study comparing the performances of different members of this class are presented. The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders. The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders.

  19. An exploratory study on the driving method of speech synthesis based on the human eye reading imaging data

    NASA Astrophysics Data System (ADS)

    Gao, Pei-pei; Liu, Feng

    2016-10-01

    With the development of information technology and artificial intelligence, speech synthesis plays a significant role in the fields of Human-Computer Interaction Techniques. However, the main problem of current speech synthesis techniques is lacking of naturalness and expressiveness so that it is not yet close to the standard of natural language. Another problem is that the human-computer interaction based on the speech synthesis is too monotonous to realize mechanism of user subjective drive. This thesis introduces the historical development of speech synthesis and summarizes the general process of this technique. It is pointed out that prosody generation module is an important part in the process of speech synthesis. On the basis of further research, using eye activity rules when reading to control and drive prosody generation was introduced as a new human-computer interaction method to enrich the synthetic form. In this article, the present situation of speech synthesis technology is reviewed in detail. Based on the premise of eye gaze data extraction, using eye movement signal in real-time driving, a speech synthesis method which can express the real speech rhythm of the speaker is proposed. That is, when reader is watching corpora with its eyes in silent reading, capture the reading information such as the eye gaze duration per prosodic unit, and establish a hierarchical prosodic pattern of duration model to determine the duration parameters of synthesized speech. At last, after the analysis, the feasibility of the above method is verified.

  20. Coding Method of LSP Residual Signals Using Wavelets for Speech Synthesis

    NASA Astrophysics Data System (ADS)

    Shimizu, Tadaaki; Kimoto, Masaya; Yoshimura, Hiroki; Isu, Naoki; Sugata, Kazuhiro

    This paper presents a method to use wavelet analysis for speech coding and synthesis by rule. It is a coding system where LSP residual signal is transformed into wavelet coefficients. As wavelet analysis is implemented effectively by filter banks, our method is featured to require less computation than multipulse coding and others where complicated prediction procedures are essential. To achieve good quality speech at low bit rates, we verified to allocate the different bits onto the wavelet coefficients, with more bits in lower frequencies, and less in higher. The synthesized speech by Haar wavelet with 16.538kbits/s has nearly same perceptual quality with 6 bits μlog PCM (66.15kbits/s). We are convinced that coding method of LSP residual signals using wavelet analysis is an effective approach to synthesize speech.

  1. Synthesis of Speaker Facial Movement to Match Selected Speech Sequences

    NASA Technical Reports Server (NTRS)

    Scott, K. C.; Kagels, D. S.; Watson, S. H.; Rom, H.; Wright, J. R.; Lee, M.; Hussey, K. J.

    1994-01-01

    A system is described which allows for the synthesis of a video sequence of a realistic-appearing talking human head. A phonic based approach is used to describe facial motion; image processing rather than physical modeling techniques are used to create video frames.

  2. Overhead analysis of universal concatenated quantum codes

    NASA Astrophysics Data System (ADS)

    Chamberland, Christopher; Jochym-O'Connor, Tomas; Laflamme, Raymond

    2017-02-01

    We analyze the resource overhead of recently proposed methods for universal fault-tolerant quantum computation using concatenated codes. Namely, we examine the concatenation of the 7-qubit Steane code with the 15-qubit Reed-Muller code, which allows for the construction of the 49- and 105-qubit codes that do not require the need for magic state distillation for universality. We compute a lower bound for the adversarial noise threshold of the 105-qubit code and find it to be 8.33 ×10-6. We obtain a depolarizing noise threshold for the 49-qubit code of 9.69 ×10-4 which is competitive with the 105-qubit threshold result of 1.28 ×10-3 . We then provide lower bounds on the resource requirements of the 49- and 105-qubit codes and compare them with the surface code implementation of a logical T gate using magic state distillation. For the sampled input error rates and noise model, we find that the surface code achieves a smaller overhead compared to our concatenated schemes.

  3. Concatenated Coding Using Trellis-Coded Modulation

    NASA Technical Reports Server (NTRS)

    Thompson, Michael W.

    1997-01-01

    In the late seventies and early eighties a technique known as Trellis Coded Modulation (TCM) was developed for providing spectrally efficient error correction coding. Instead of adding redundant information in the form of parity bits, redundancy is added at the modulation stage thereby increasing bandwidth efficiency. A digital communications system can be designed to use bandwidth-efficient multilevel/phase modulation such as Amplitude Shift Keying (ASK), Phase Shift Keying (PSK), Differential Phase Shift Keying (DPSK) or Quadrature Amplitude Modulation (QAM). Performance gain can be achieved by increasing the number of signals over the corresponding uncoded system to compensate for the redundancy introduced by the code. A considerable amount of research and development has been devoted toward developing good TCM codes for severely bandlimited applications. More recently, the use of TCM for satellite and deep space communications applications has received increased attention. This report describes the general approach of using a concatenated coding scheme that features TCM and RS coding. Results have indicated that substantial (6-10 dB) performance gains can be achieved with this approach with comparatively little bandwidth expansion. Since all of the bandwidth expansion is due to the RS code we see that TCM based concatenated coding results in roughly 10-50% bandwidth expansion compared to 70-150% expansion for similar concatenated scheme which use convolution code. We stress that combined coding and modulation optimization is important for achieving performance gains while maintaining spectral efficiency.

  4. Iterative Decoding of Concatenated Codes: A Tutorial

    NASA Astrophysics Data System (ADS)

    Regalia, Phillip A.

    2005-12-01

    The turbo decoding algorithm of a decade ago constituted a milestone in error-correction coding for digital communications, and has inspired extensions to generalized receiver topologies, including turbo equalization, turbo synchronization, and turbo CDMA, among others. Despite an accrued understanding of iterative decoding over the years, the "turbo principle" remains elusive to master analytically, thereby inciting interest from researchers outside the communications domain. In this spirit, we develop a tutorial presentation of iterative decoding for parallel and serial concatenated codes, in terms hopefully accessible to a broader audience. We motivate iterative decoding as a computationally tractable attempt to approach maximum-likelihood decoding, and characterize fixed points in terms of a "consensus" property between constituent decoders. We review how the decoding algorithm for both parallel and serial concatenated codes coincides with an alternating projection algorithm, which allows one to identify conditions under which the algorithm indeed converges to a maximum-likelihood solution, in terms of particular likelihood functions factoring into the product of their marginals. The presentation emphasizes a common framework applicable to both parallel and serial concatenated codes.

  5. HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation

    NASA Astrophysics Data System (ADS)

    Nose, Takashi; Tachibana, Makoto; Kobayashi, Takao

    This paper presents methods for controlling the intensity of emotional expressions and speaking styles of an arbitrary speaker's synthetic speech by using a small amount of his/her speech data in HMM-based speech synthesis. Model adaptation approaches are introduced into the style control technique based on the multiple-regression hidden semi-Markov model (MRHSMM). Two different approaches are proposed for training a target speaker's MRHSMMs. The first one is MRHSMM-based model adaptation in which the pretrained MRHSMM is adapted to the target speaker's model. For this purpose, we formulate the MLLR adaptation algorithm for the MRHSMM. The second method utilizes simultaneous adaptation of speaker and style from an average voice model to obtain the target speaker's style-dependent HSMMs which are used for the initialization of the MRHSMM. From the result of subjective evaluation using adaptation data of 50 sentences of each style, we show that the proposed methods outperform the conventional speaker-dependent model training when using the same size of speech data of the target speaker.

  6. A concatenated coding scheme for error control

    NASA Technical Reports Server (NTRS)

    Lin, S.

    1985-01-01

    A concatenated coding scheme for error contol in data communications was analyzed. The inner code is used for both error correction and detection, however the outer code is used only for error detection. A retransmission is requested if either the inner code decoder fails to make a successful decoding or the outer code decoder detects the presence of errors after the inner code decoding. Probability of undetected error of the proposed scheme is derived. An efficient method for computing this probability is presented. Throughout efficiency of the proposed error control scheme incorporated with a selective repeat ARQ retransmission strategy is analyzed.

  7. Soft context clustering for F0 modeling in HMM-based speech synthesis

    NASA Astrophysics Data System (ADS)

    Khorram, Soheil; Sameti, Hossein; King, Simon

    2015-12-01

    This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure

  8. Stereotaxy, navigation and the temporal concatenation.

    PubMed

    Apuzzo, M L; Chen, J C

    1999-01-01

    Nautical and cerebral navigation share similar elements of functional need and similar developmental pathways. The need for orientation necessitates the development of appropriate concepts, and such concepts are dependent on technology for practical realization. Occasionally, a concept precedes technology in time and requires periods of delay for appropriate development. A temporal concatenation exists where time allows the additive as need, concept and technology ultimately provide an endpoint of elegant solution. Nautical navigation has proceeded through periods of dead reckoning and celestial navigation to satellite orientation with associated refinements of instrumentation and charts for guidance. Cerebral navigation has progressed from craniometric orientation and burr hole mounted guidance systems to simple rectolinear and arc-centered devices based on radiographs to guidance by complex anatomical and functional maps provided as an amalgam of modern imaging modes. These maps are now augmented by complex frame and frameless systems which allow not only precise orientation, but also point and volumetric action. These complex technical modalities required and developed in part from elements of maritime navigation that have been translated to cerebral navigation in a temporal concatenation.

  9. An Interactive Concatenated Turbo Coding System

    NASA Technical Reports Server (NTRS)

    Liu, Ye; Tang, Heng; Lin, Shu; Fossorier, Marc

    1999-01-01

    This paper presents a concatenated turbo coding system in which a Reed-Solomon outer code is concatenated with a binary turbo inner code. In the proposed system, the outer code decoder and the inner turbo code decoder interact to achieve both good bit error and frame error performances. The outer code decoder helps the inner turbo code decoder to terminate its decoding iteration while the inner turbo code decoder provides soft-output information to the outer code decoder to carry out a reliability-based soft- decision decoding. In the case that the outer code decoding fails, the outer code decoder instructs the inner code decoder to continue its decoding iterations until the outer code decoding is successful or a preset maximum number of decoding iterations is reached. This interaction between outer and inner code decoders reduces decoding delay. Also presented in the paper are an effective criterion for stopping the iteration process of the inner code decoder and a new reliability-based decoding algorithm for nonbinary codes.

  10. A concatenated coding scheme for error control

    NASA Technical Reports Server (NTRS)

    Kasami, T.; Fujiwara, T.; Lin, S.

    1986-01-01

    In this paper, a concatenated coding scheme for error control in data communications is presented and analyzed. In this scheme, the inner code is used for both error correction and detection; however, the outer code is used only for error detection. A retransmission is requested if either the inner code decoder fails to make a successful decoding or the outer code decoder detects the presence of errors after the inner code decoding. Probability of undetected error (or decoding error) of the proposed scheme is derived. An efficient method for computing this probability is presented. Throughput efficiency of the proposed error control scheme incorporated with a selective-repeat ARQ retransmission strategy is also analyzed. Three specific examples are presented. One of the examples is proposed for error control in the NASA Telecommand System.

  11. The Compensatory Effectiveness of Optical Character Recognition/Speech Synthesis on Reading Comprehension of Postsecondary Students with Learning Disabilities.

    ERIC Educational Resources Information Center

    Higgins, Eleanor L.; Raskind, Marshall H.

    1997-01-01

    Thirty-seven college students with learning disabilities were given a reading comprehension task under the following conditions: (1) using an optical character recognition/speech synthesis system; (2) having the text read aloud by a human reader; or (3) reading silently without assistance. Findings indicated that the greater the disability, the…

  12. (abstract) Synthesis of Speaker Facial Movements to Match Selected Speech Sequences

    NASA Technical Reports Server (NTRS)

    Scott, Kenneth C.

    1994-01-01

    We are developing a system for synthesizing image sequences the simulate the facial motion of a speaker. To perform this synthesis, we are pursuing two major areas of effort. We are developing the necessary computer graphics technology to synthesize a realistic image sequence of a person speaking selected speech sequences. Next, we are developing a model that expresses the relation between spoken phonemes and face/mouth shape. A subject is video taped speaking an arbitrary text that contains expression of the full list of desired database phonemes. The subject is video taped from the front speaking normally, recording both audio and video detail simultaneously. Using the audio track, we identify the specific video frames on the tape relating to each spoken phoneme. From this range we digitize the video frame which represents the extreme of mouth motion/shape. Thus, we construct a database of images of face/mouth shape related to spoken phonemes. A selected audio speech sequence is recorded which is the basis for synthesizing a matching video sequence; the speaker need not be the same as used for constructing the database. The audio sequence is analyzed to determine the spoken phoneme sequence and the relative timing of the enunciation of those phonemes. Synthesizing an image sequence corresponding to the spoken phoneme sequence is accomplished using a graphics technique known as morphing. Image sequence keyframes necessary for this processing are based on the spoken phoneme sequence and timing. We have been successful in synthesizing the facial motion of a native English speaker for a small set of arbitrary speech segments. Our future work will focus on advancement of the face shape/phoneme model and independent control of facial features.

  13. Concatenated coding for low date rate space communications.

    NASA Technical Reports Server (NTRS)

    Chen, C. H.

    1972-01-01

    In deep space communications with distant planets, the data rate as well as the operating SNR may be very low. To maintain the error rate also at a very low level, it is necessary to use a sophisticated coding system (longer code) without excessive decoding complexity. The concatenated coding has been shown to meet such requirements in that the error rate decreases exponentially with the overall length of the code while the decoder complexity increases only algebraically. Three methods of concatenating an inner code with an outer code are considered. Performance comparison of the three concatenated codes is made.

  14. Advancements in text-to-speech technology and implications for AAC applications

    NASA Astrophysics Data System (ADS)

    Syrdal, Ann K.

    2003-10-01

    Intelligibility was the initial focus in text-to-speech (TTS) research, since it is clearly a necessary condition for the application of the technology. Sufficiently high intelligibility (approximating human speech) has been achieved in the last decade by the better formant-based and concatenative TTS systems. This led to commercially available TTS systems for highly motivated users, particularly the blind and vocally impaired. Some unnatural qualities of TTS were exploited by these users, such as very fast speaking rates and altered pitch ranges for flagging relevant information. Recently, the focus in TTS research has turned to improving naturalness, so that synthetic speech sounds more human and less robotic. Unit selection approaches to concatenative synthesis have dramatically improved TTS quality, although at the cost of larger and more complex systems. This advancement in naturalness has made TTS technology more acceptable to the general public. The vocally impaired appreciate a more natural voice with which to represent themselves when communicating with others. Unit selection TTS does not achieve such high speaking rates as the earlier TTS systems, however, which is a disadvantage to some AAC device users. An important new research emphasis is to improve and increase the range of emotional expressiveness of TTS.

  15. Performance Bounds on Two Concatenated, Interleaved Codes

    NASA Technical Reports Server (NTRS)

    Moision, Bruce; Dolinar, Samuel

    2010-01-01

    A method has been developed of computing bounds on the performance of a code comprised of two linear binary codes generated by two encoders serially concatenated through an interleaver. Originally intended for use in evaluating the performances of some codes proposed for deep-space communication links, the method can also be used in evaluating the performances of short-block-length codes in other applications. The method applies, more specifically, to a communication system in which following processes take place: At the transmitter, the original binary information that one seeks to transmit is first processed by an encoder into an outer code (Co) characterized by, among other things, a pair of numbers (n,k), where n (n > k)is the total number of code bits associated with k information bits and n k bits are used for correcting or at least detecting errors. Next, the outer code is processed through either a block or a convolutional interleaver. In the block interleaver, the words of the outer code are processed in blocks of I words. In the convolutional interleaver, the interleaving operation is performed bit-wise in N rows with delays that are multiples of B bits. The output of the interleaver is processed through a second encoder to obtain an inner code (Ci) characterized by (ni,ki). The output of the inner code is transmitted over an additive-white-Gaussian- noise channel characterized by a symbol signal-to-noise ratio (SNR) Es/No and a bit SNR Eb/No. At the receiver, an inner decoder generates estimates of bits. Depending on whether a block or a convolutional interleaver is used at the transmitter, the sequence of estimated bits is processed through a block or a convolutional de-interleaver, respectively, to obtain estimates of code words. Then the estimates of the code words are processed through an outer decoder, which generates estimates of the original information along with flags indicating which estimates are presumed to be correct and which are found to

  16. The Neural Basis of Speech Parsing in Children and Adults

    ERIC Educational Resources Information Center

    McNealy, Kristin; Mazziotta, John C.; Dapretto, Mirella

    2010-01-01

    Word segmentation, detecting word boundaries in continuous speech, is a fundamental aspect of language learning that can occur solely by the computation of statistical and speech cues. Fifty-four children underwent functional magnetic resonance imaging (fMRI) while listening to three streams of concatenated syllables that contained either high…

  17. Hardware Implementation of Serially Concatenated PPM Decoder

    NASA Technical Reports Server (NTRS)

    Moision, Bruce; Hamkins, Jon; Barsoum, Maged; Cheng, Michael; Nakashima, Michael

    2009-01-01

    A prototype decoder for a serially concatenated pulse position modulation (SCPPM) code has been implemented in a field-programmable gate array (FPGA). At the time of this reporting, this is the first known hardware SCPPM decoder. The SCPPM coding scheme, conceived for free-space optical communications with both deep-space and terrestrial applications in mind, is an improvement of several dB over the conventional Reed-Solomon PPM scheme. The design of the FPGA SCPPM decoder is based on a turbo decoding algorithm that requires relatively low computational complexity while delivering error-rate performance within approximately 1 dB of channel capacity. The SCPPM encoder consists of an outer convolutional encoder, an interleaver, an accumulator, and an inner modulation encoder (more precisely, a mapping of bits to PPM symbols). Each code is describable by a trellis (a finite directed graph). The SCPPM decoder consists of an inner soft-in-soft-out (SISO) module, a de-interleaver, an outer SISO module, and an interleaver connected in a loop (see figure). Each SISO module applies the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm to compute a-posteriori bit log-likelihood ratios (LLRs) from apriori LLRs by traversing the code trellis in forward and backward directions. The SISO modules iteratively refine the LLRs by passing the estimates between one another much like the working of a turbine engine. Extrinsic information (the difference between the a-posteriori and a-priori LLRs) is exchanged rather than the a-posteriori LLRs to minimize undesired feedback. All computations are performed in the logarithmic domain, wherein multiplications are translated into additions, thereby reducing complexity and sensitivity to fixed-point implementation roundoff errors. To lower the required memory for storing channel likelihood data and the amounts of data transfer between the decoder and the receiver, one can discard the majority of channel likelihoods, using only the remainder in

  18. Protein knotting through concatenation significantly reduces folding stability

    PubMed Central

    Hsu, Shang-Te Danny

    2016-01-01

    Concatenation by covalent linkage of two protomers of an intertwined all-helical HP0242 homodimer from Helicobacter pylori results in the first example of an engineered knotted protein. While concatenation does not affect the native structure according to X-ray crystallography, the folding kinetics is substantially slower compared to the parent homodimer. Using NMR hydrogen-deuterium exchange analysis, we showed here that concatenation destabilises significantly the knotted structure in solution, with some regions close to the covalent linkage being destabilised by as much as 5 kcal mol−1. Structural mapping of chemical shift perturbations induced by concatenation revealed a pattern that is similar to the effect induced by concentrated chaotrophic agent. Our results suggested that the design strategy of protein knotting by concatenation may be thermodynamically unfavourable due to covalent constrains imposed on the flexible fraying ends of the template structure, leading to rugged free energy landscape with increased propensity to form off-pathway folding intermediates. PMID:27982106

  19. Speech processing using maximum likelihood continuity mapping

    SciTech Connect

    Hogden, J.E.

    2000-04-18

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  20. Speech processing using maximum likelihood continuity mapping

    DOEpatents

    Hogden, John E.

    2000-01-01

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  1. Prosody Production and Perception with Conversational Speech

    ERIC Educational Resources Information Center

    Mo, Yoonsook

    2010-01-01

    Speech utterances are more than the linear concatenation of individual phonemes or words. They are organized by prosodic structures comprising phonological units of different sizes (e.g., syllable, foot, word, and phrase) and the prominence relations among them. As the linguistic structure of spoken languages, prosody serves an important function…

  2. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  3. Bounds on Block Error Probability for Multilevel Concatenated Codes

    NASA Technical Reports Server (NTRS)

    Lin, Shu; Moorthy, Hari T.; Stojanovic, Diana

    1996-01-01

    Maximum likelihood decoding of long block codes is not feasable due to large complexity. Some classes of codes are shown to be decomposable into multilevel concatenated codes (MLCC). For these codes, multistage decoding provides good trade-off between performance and complexity. In this paper, we derive an upper bound on the probability of block error for MLCC. We use this bound to evaluate difference in performance for different decompositions of some codes. Examples given show that a significant reduction in complexity can be achieved when increasing number of stages of decoding. Resulting performance degradation varies for different decompositions. A guideline is given for finding good m-level decompositions.

  4. A Two-Phase Damped-Exponential Model for Speech Synthesis

    DTIC Science & Technology

    2011-07-21

    speech are decomposed into small constant length analysis frames. Second, the phoneme label corresponding to each analysis frame is compared to values ...containing non-zero values only at estimated glottal closing instants. GeLfO uses an algorithm similar to the method of Secrest and Doddington [28]. This...file formated for use with ESPS and produces a file containing estimates of: fo; a voicing probability; the RMS energy value ; and peak autocorrelation

  5. Speech Analysis and Synthesis and Man-Machine Speech Communications for Air Operations. (Synthese et Analyse de la Parole et Liaisons Vocales Homme- Machine dans les Operations Aeriennes)

    DTIC Science & Technology

    1990-05-01

    fact that the spoken word plays and will continue to play a significant role in man-man, man-machine and machine-man communications for air operation...quality speech at 32 kb/s. In fact, a highly complex version cah provide high-quality speech at the lower bit rate. similarly’lower-complexity...precis on speech coding which is given here because of tho central role of the subject in the whole !poeoh processing field will be elaborated on and

  6. A concatenated coded modulation scheme for error control

    NASA Technical Reports Server (NTRS)

    Kasami, Tadao; Takata, Toyoo; Fujiwara, Toru; Lin, Shu

    1990-01-01

    A concatenated coded modulation scheme for error control in data communications is presented. The scheme is achieved by concatenating a Reed-Solomon outer code and a bandwidth efficient block inner code for M-ary PSK modulation. Error performance of the scheme is analyzed for an AWGN channel. It is shown that extremely high reliability can be attained by using a simple M-ary PSK modulation inner code and relatively powerful Reed-Solomon outer code. Furthermore, if an inner code of high effective rate is used, the bandwidth expansion required by the scheme due to coding will be greatly reduced. The proposed scheme is particularly effective for high speed satellite communications for large file transfer where high reliability is required. Also presented is a simple method for constructing block codes for M-ary PSK modulation. Some short M-ary PSK codes with good minimum squared Euclidean distance are constructed. These codes have trellis structure and hence can be decoded with a soft-decision Viterbi decoding algorithm.

  7. Concatenation of 'alert' and 'identity' segments in dingoes' alarm calls.

    PubMed

    Déaux, Eloïse C; Allen, Andrew P; Clarke, Jennifer A; Charrier, Isabelle

    2016-07-27

    Multicomponent signals can be formed by the uninterrupted concatenation of multiple call types. One such signal is found in dingoes, Canis familiaris dingo. This stereotyped, multicomponent 'bark-howl' vocalisation is formed by the concatenation of a noisy bark segment and a tonal howl segment. Both segments are structurally similar to bark and howl vocalisations produced independently in other contexts (e.g. intra- and inter-pack communication). Bark-howls are mainly uttered in response to human presence and were hypothesized to serve as alarm calls. We investigated the function of bark-howls and the respective roles of the bark and howl segments. We found that dingoes could discriminate between familiar and unfamiliar howl segments, after having only heard familiar howl vocalisations (i.e. different calls). We propose that howl segments could function as 'identity signals' and allow receivers to modulate their responses according to the caller's characteristics. The bark segment increased receivers' attention levels, providing support for earlier observational claims that barks have an 'alerting' function. Lastly, dingoes were more likely to display vigilance behaviours upon hearing bark-howl vocalisations, lending support to the alarm function hypothesis. Canid vocalisations, such as the dingo bark-howl, may provide a model system to investigate the selective pressures shaping complex communication systems.

  8. Serial turbo trellis coded modulation using a serially concatenated coder

    NASA Technical Reports Server (NTRS)

    Divsalar, Dariush (Inventor); Dolinar, Samuel J. (Inventor); Pollara, Fabrizio (Inventor)

    2010-01-01

    Serial concatenated trellis coded modulation (SCTCM) includes an outer coder, an interleaver, a recursive inner coder and a mapping element. The outer coder receives data to be coded and produces outer coded data. The interleaver permutes the outer coded data to produce interleaved data. The recursive inner coder codes the interleaved data to produce inner coded data. The mapping element maps the inner coded data to a symbol. The recursive inner coder has a structure which facilitates iterative decoding of the symbols at a decoder system. The recursive inner coder and the mapping element are selected to maximize the effective free Euclidean distance of a trellis coded modulator formed from the recursive inner coder and the mapping element. The decoder system includes a demodulation unit, an inner SISO (soft-input soft-output) decoder, a deinterleaver, an outer SISO decoder, and an interleaver.

  9. Serial turbo trellis coded modulation using a serially concatenated coder

    NASA Technical Reports Server (NTRS)

    Divsalar, Dariush (Inventor); Dolinar, Samuel J. (Inventor); Pollara, Fabrizio (Inventor)

    2011-01-01

    Serial concatenated trellis coded modulation (SCTCM) includes an outer coder, an interleaver, a recursive inner coder and a mapping element. The outer coder receives data to be coded and produces outer coded data. The interleaver permutes the outer coded data to produce interleaved data. The recursive inner coder codes the interleaved data to produce inner coded data. The mapping element maps the inner coded data to a symbol. The recursive inner coder has a structure which facilitates iterative decoding of the symbols at a decoder system. The recursive inner coder and the mapping element are selected to maximize the effective free Euclidean distance of a trellis coded modulator formed from the recursive inner coder and the mapping element. The decoder system includes a demodulation unit, an inner SISO (soft-input soft-output) decoder, a deinterleaver, an outer SISO decoder, and an interleaver.

  10. THE COMPREHENSION OF RAPID SPEECH BY THE BLIND, PART III.

    ERIC Educational Resources Information Center

    FOULKE, EMERSON

    A REVIEW OF THE RESEARCH ON THE COMPREHENSION OF RAPID SPEECH BY THE BLIND IDENTIFIES FIVE METHODS OF SPEECH COMPRESSION--SPEECH CHANGING, ELECTROMECHANICAL SAMPLING, COMPUTER SAMPLING, SPEECH SYNTHESIS, AND FREQUENCY DIVIDING WITH THE HARMONIC COMPRESSOR. THE SPEECH CHANGING AND ELECTROMECHANICAL SAMPLING METHODS AND THE NECESSARY APPARATUS HAVE…

  11. A Statistical Approach to Automatic Speech Summarization

    NASA Astrophysics Data System (ADS)

    Hori, Chiori; Furui, Sadaoki; Malkin, Rob; Yu, Hua; Waibel, Alex

    2003-12-01

    This paper proposes a statistical approach to automatic speech summarization. In our method, a set of words maximizing a summarization score indicating the appropriateness of summarization is extracted from automatically transcribed speech and then concatenated to create a summary. The extraction process is performed using a dynamic programming (DP) technique based on a target compression ratio. In this paper, we demonstrate how an English news broadcast transcribed by a speech recognizer is automatically summarized. We adapted our method, which was originally proposed for Japanese, to English by modifying the model for estimating word concatenation probabilities based on a dependency structure in the original speech given by a stochastic dependency context free grammar (SDCFG). We also propose a method of summarizing multiple utterances using a two-level DP technique. The automatically summarized sentences are evaluated by summarization accuracy based on a comparison with a manual summary of speech that has been correctly transcribed by human subjects. Our experimental results indicate that the method we propose can effectively extract relatively important information and remove redundant and irrelevant information from English news broadcasts.

  12. Multilevel Concatenated Block Modulation Codes for the Frequency Non-selective Rayleigh Fading Channel

    NASA Technical Reports Server (NTRS)

    Lin, Shu; Rhee, Dojun

    1996-01-01

    This paper is concerned with construction of multilevel concatenated block modulation codes using a multi-level concatenation scheme for the frequency non-selective Rayleigh fading channel. In the construction of multilevel concatenated modulation code, block modulation codes are used as the inner codes. Various types of codes (block or convolutional, binary or nonbinary) are being considered as the outer codes. In particular, we focus on the special case for which Reed-Solomon (RS) codes are used as the outer codes. For this special case, a systematic algebraic technique for constructing q-level concatenated block modulation codes is proposed. Codes have been constructed for certain specific values of q and compared with the single-level concatenated block modulation codes using the same inner codes. A multilevel closest coset decoding scheme for these codes is proposed.

  13. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  14. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  15. Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness

    NASA Astrophysics Data System (ADS)

    Liu, Kang; Ostermann, Joern

    Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.

  16. Cyanuric acid hydrolase: evolutionary innovation by structural concatenation

    PubMed Central

    Peat, Thomas S; Balotra, Sahil; Wilding, Matthew; French, Nigel G; Briggs, Lyndall J; Panjikar, Santosh; Cowieson, Nathan; Newman, Janet; Scott, Colin

    2013-01-01

    The cyanuric acid hydrolase, AtzD, is the founding member of a newly identified family of ring-opening amidases. We report the first X-ray structure for this family, which is a novel fold (termed the ‘Toblerone’ fold) that likely evolved via the concatenation of monomers of the trimeric YjgF superfamily and the acquisition of a metal binding site. Structures of AtzD with bound substrate (cyanuric acid) and inhibitors (phosphate, barbituric acid and melamine), along with mutagenesis studies, allowed the identification of the active site. The AtzD monomer, active site and substrate all possess threefold rotational symmetry, to the extent that the active site possesses three potential Ser–Lys catalytic dyads. A single catalytic dyad (Ser85–Lys42) is hypothesized, based on biochemical evidence and crystallographic data. A plausible catalytic mechanism based on these observations is also presented. A comparison with a homology model of the related barbiturase, Bar, was used to infer the active-site residues responsible for substrate specificity, and the phylogeny of the 68 AtzD-like enzymes in the database were analysed in light of this structure–function relationship. PMID:23651355

  17. Medical reliable network using concatenated channel codes through GSM network.

    PubMed

    Ahmed, Emtithal; Kohno, Ryuji

    2013-01-01

    Although the 4(th) generation (4G) of global mobile communication network, i.e. Long Term Evolution (LTE) coexisting with the 3(rd) generation (3G) has successfully started; the 2(nd) generation (2G), i.e. Global System for Mobile communication (GSM) still playing an important role in many developing countries. Without any other reliable network infrastructure, GSM can be applied for tele-monitoring applications, where high mobility and low cost are necessary. A core objective of this paper is to introduce the design of a more reliable and dependable Medical Network Channel Code system (MNCC) through GSM Network. MNCC design based on simple concatenated channel code, which is cascade of an inner code (GSM) and an extra outer code (Convolution Code) in order to protect medical data more robust against channel errors than other data using the existing GSM network. In this paper, the MNCC system will provide Bit Error Rate (BER) equivalent to the BER for medical tele monitoring of physiological signals, which is 10(-5) or less. The performance of the MNCC has been proven and investigated using computer simulations under different channels condition such as, Additive White Gaussian Noise (AWGN), Rayleigh noise and burst noise. Generally the MNCC system has been providing better performance as compared to GSM.

  18. Optimal and efficient decoding of concatenated quantum block codes

    SciTech Connect

    Poulin, David

    2006-11-15

    We consider the problem of optimally decoding a quantum error correction code--that is, to find the optimal recovery procedure given the outcomes of partial ''check'' measurements on the system. In general, this problem is NP hard. However, we demonstrate that for concatenated block codes, the optimal decoding can be efficiently computed using a message-passing algorithm. We compare the performance of the message-passing algorithm to that of the widespread blockwise hard decoding technique. Our Monte Carlo results using the five-qubit and Steane's code on a depolarizing channel demonstrate significant advantages of the message-passing algorithms in two respects: (i) Optimal decoding increases by as much as 94% the error threshold below which the error correction procedure can be used to reliably send information over a noisy channel; and (ii) for noise levels below these thresholds, the probability of error after optimal decoding is suppressed at a significantly higher rate, leading to a substantial reduction of the error correction overhead.

  19. Digression and Value Concatenation to Enable Privacy-Preserving Regression

    PubMed Central

    Li, Xiao-Bai; Sarkar, Sumit

    2015-01-01

    Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals’ sensitive data. This problem, which we call a “regression attack,” has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called digression, which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis. PMID:26752802

  20. Hamming and Accumulator Codes Concatenated with MPSK or QAM

    NASA Technical Reports Server (NTRS)

    Divsalar, Dariush; Dolinar, Samuel

    2009-01-01

    In a proposed coding-and-modulation scheme, a high-rate binary data stream would be processed as follows: 1. The input bit stream would be demultiplexed into multiple bit streams. 2. The multiple bit streams would be processed simultaneously into a high-rate outer Hamming code that would comprise multiple short constituent Hamming codes a distinct constituent Hamming code for each stream. 3. The streams would be interleaved. The interleaver would have a block structure that would facilitate parallelization for high-speed decoding. 4. The interleaved streams would be further processed simultaneously into an inner two-state, rate-1 accumulator code that would comprise multiple constituent accumulator codes - a distinct accumulator code for each stream. 5. The resulting bit streams would be mapped into symbols to be transmitted by use of a higher-order modulation - for example, M-ary phase-shift keying (MPSK) or quadrature amplitude modulation (QAM). The novelty of the scheme lies in the concatenation of the multiple-constituent Hamming and accumulator codes and the corresponding parallel architectures of the encoder and decoder circuitry (see figure) needed to process the multiple bit streams simultaneously. As in the cases of other parallel-processing schemes, one advantage of this scheme is that the overall data rate could be much greater than the data rate of each encoder and decoder stream and, hence, the encoder and decoder could handle data at an overall rate beyond the capability of the individual encoder and decoder circuits.

  1. Research in speech communication.

    PubMed Central

    Flanagan, J

    1995-01-01

    Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker. Images Fig. 1 Fig. 2 Fig. 5 Fig. 8 Fig. 11 Fig. 12 Fig. 13 PMID:7479806

  2. Performance Analysis of the Link-16/JTIDS Waveform With Concatenated Coding

    DTIC Science & Technology

    2009-09-01

    ANALYSIS OF THE LINK-16/ JTIDS WAVEFORM WITH CONCATENATED CODING by Ioannis Koromilas September 2009 Thesis Advisor: Ralph C. Robertson...Master’s Thesis 4. TITLE AND SUBTITLE: Performance Analysis of the Link-16/ JTIDS Waveform with Concatenated Coding 6. AUTHOR Ioannis Koromilas 5...capabilities. The communication terminal of Link-16 is called the Joint Tactical Information Distribution System ( JTIDS ) and features Reed-Solomon (RS) coding

  3. Speech enhancement via two-stage dual tree complex wavelet packet transform with a speech presence probability estimator

    NASA Astrophysics Data System (ADS)

    Sun, Pengfei; Qin, Jun

    2017-02-01

    In this paper, a two-stage dual tree complex wavelet packet transform (DTCWPT) based speech enhancement algorithm has been proposed, in which a speech presence probability (SPP) estimator and a generalized minimum mean squared error (MMSE) estimator are developed. To overcome the drawback of signal distortions caused by down sampling of WPT, a two-stage analytic decomposition concatenating undecimated WPT (UWPT) and decimated WPT is employed. An SPP estimator in the DTCWPT domain is derived based on a generalized Gamma distribution of speech, and Gaussian noise assumption. The validation results show that the proposed algorithm can obtain enhanced perceptual evaluation of speech quality (PESQ), and segmental signal-to-noise ratio (SegSNR) at low SNR nonstationary noise, compared with other four state-of-the-art speech enhancement algorithms, including optimally modified LSA (OM-LSA), soft masking using a posteriori SNR uncertainty (SMPO), a posteriori SPP based MMSE estimation (MMSE-SPP), and adaptive Bayesian wavelet thresholding (BWT).

  4. Speech Problems

    MedlinePlus

    ... and the respiratory system . The ability to understand language and produce speech is coordinated by the brain. So a person with brain damage from an accident, stroke, or birth defect may have speech and language problems. Some people with speech problems, particularly articulation ...

  5. High pH reversed-phase chromatography with fraction concatenation for 2D proteomic analysis

    SciTech Connect

    Yang, Feng; Shen, Yufeng; Camp, David G.; Smith, Richard D.

    2012-04-01

    Orthogonal high-resolution separations are critical for attaining improved analytical dynamic ranges of proteome measurements. Concatenated high pH reversed phase liquid chromatography affords better separations than the strong cation exchange conventionally applied for two-dimensional shotgun proteomic analysis. For example, concatenated high pH reversed phase liquid chromatography increased identification coverage for peptides (e.g., by 1.8-fold) and proteins (e.g., by 1.6-fold) in shotgun proteomics analyses of a digested human protein sample. Additional advantages of concatenated high pH RPLC include improved protein sequence coverage, simplified sample processing, and reduced sample losses, making this an attractive first dimension separation strategy for two-dimensional proteomics analyses.

  6. Punctured Parallel and Serial Concatenated Convolutional Codes for BPSK/QPSK Channels

    NASA Technical Reports Server (NTRS)

    Acikel, Omer Fatih

    1999-01-01

    As available bandwidth for communication applications becomes scarce, bandwidth-efficient modulation and coding schemes become ever important. Since their discovery in 1993, turbo codes (parallel concatenated convolutional codes) have been the center of the attention in the coding community because of their bit error rate performance near the Shannon limit. Serial concatenated convolutional codes have also been shown to be as powerful as turbo codes. In this dissertation, we introduce algorithms for designing bandwidth-efficient rate r = k/(k + 1),k = 2, 3,..., 16, parallel and rate 3/4, 7/8, and 15/16 serial concatenated convolutional codes via puncturing for BPSK/QPSK (Binary Phase Shift Keying/Quadrature Phase Shift Keying) channels. Both parallel and serial concatenated convolutional codes have initially, steep bit error rate versus signal-to-noise ratio slope (called the -"cliff region"). However, this steep slope changes to a moderate slope with increasing signal-to-noise ratio, where the slope is characterized by the weight spectrum of the code. The region after the cliff region is called the "error rate floor" which dominates the behavior of these codes in moderate to high signal-to-noise ratios. Our goal is to design high rate parallel and serial concatenated convolutional codes while minimizing the error rate floor effect. The design algorithm includes an interleaver enhancement procedure and finds the polynomial sets (only for parallel concatenated convolutional codes) and the puncturing schemes that achieve the lowest bit error rate performance around the floor for the code rates of interest.

  7. Programmable concatenation of conductively linked gold nanorods using molecular assembly and femtosecond irradiation

    NASA Astrophysics Data System (ADS)

    Fontana, Jake; Flom, Steve; Naciri, Jawad; Ratna, Banahalli

    The ability to tune the resonant frequency in plasmonic nanostructures is fundamental to developing novel optical properties and ensuing materials. Recent theoretical insights show that the plasmon resonance can be exquisitely controlled through the conductive concatenation of plasmonic nanoparticles. Furthermore these charge transfer systems may mimic complex and hard to build nanostructures. Here we experimentally demonstrate a directed molecular assembly approach to controllably concatenate gold nanorods end to end into discrete linear structures, bridged with gold nanojunctions, using femtosecond laser light. By utilizing high throughput and nanometer resolution this approach offers a pragmatic assembly strategy for charge transfer plasmonic systems.

  8. On the error statistics of Viterbi decoding and the performance of concatenated codes

    NASA Technical Reports Server (NTRS)

    Miller, R. L.; Deutsch, L. J.; Butman, S. A.

    1981-01-01

    Computer simulation results are presented on the performance of convolutional codes of constraint lengths 7 and 10 concatenated with the (255, 223) Reed-Solomon code (a proposed NASA standard). These results indicate that as much as 0.8 dB can be gained by concatenating this Reed-Solomon code with a (10, 1/3) convolutional code, instead of the (7, 1/2) code currently used by the DSN. A mathematical model of Viterbi decoder burst-error statistics is developed and is validated through additional computer simulations.

  9. Concatenative and Nonconcatenative Plural Formation in L1, L2, and Heritage Speakers of Arabic

    ERIC Educational Resources Information Center

    Albirini, Abdulkafi; Benmamoun, Elabbas

    2014-01-01

    This study compares Arabic L1, L2, and heritage speakers' (HS) knowledge of plural formation, which involves concatenative and nonconcatenative modes of derivation. Ninety participants (divided equally among L1, L2, and heritage speakers) completed two oral tasks: a picture naming task (to measure proficiency) and a plural formation task. The…

  10. Molecular phylogenetic analysis of the Papionina using concatenation and species tree methods.

    PubMed

    Guevara, Elaine E; Steiper, Michael E

    2014-01-01

    The Papionina is a geographically widespread subtribe of African cercopithecid monkeys whose evolutionary history is of particular interest to anthropologists. The phylogenetic relationships among arboreal mangabeys (Lophocebus), baboons (Papio), and geladas (Theropithecus) remain unresolved. Molecular phylogenetic analyses have revealed marked gene tree incongruence for these taxa, and several recent concatenated phylogenetic analyses of multilocus datasets have supported different phylogenetic hypotheses. To address this issue, we investigated the phylogeny of the Lophocebus + Papio + Theropithecus group using concatenation methods, as well as alternative methods that incorporate gene tree heterogeneity to estimate a 'species tree.' Our compiled DNA sequence dataset was ∼56 kb pairs long and included 57 independent partitions. All analyses of concatenated alignments strongly supported a Lophocebus + Papio clade and a basal position for Theropithecus. The Bayesian concordance analysis supported the same phylogeny. A coalescent-based Bayesian method resulted in a very poorly resolved species tree. The topological agreement between concatenation and the Bayesian concordance analysis offers considerable support for a Lophocebus + Papio clade as the dominant relationship across the genome. However, the results of the Bayesian concordance analysis indicate that almost half the genome has an alternative history. As such, our results offer a well-supported phylogenetic hypothesis for the Papio/Lophocebus/Theropithecus trichotomy, while at the same time providing evidence for a complex evolutionary history that likely includes hybridization among lineages.

  11. Computational neuroanatomy of speech production.

    PubMed

    Hickok, Gregory

    2012-01-05

    Speech production has been studied predominantly from within two traditions, psycholinguistics and motor control. These traditions have rarely interacted, and the resulting chasm between these approaches seems to reflect a level of analysis difference: whereas motor control is concerned with lower-level articulatory control, psycholinguistics focuses on higher-level linguistic processing. However, closer examination of both approaches reveals a substantial convergence of ideas. The goal of this article is to integrate psycholinguistic and motor control approaches to speech production. The result of this synthesis is a neuroanatomically grounded, hierarchical state feedback control model of speech production.

  12. Computational neuroanatomy of speech production

    PubMed Central

    Hickok, Gregory

    2017-01-01

    Speech production has been studied predominantly from within two traditions, psycholinguistics and motor control. These traditions have rarely interacted and the resulting chasm between these approaches seems to reflect a level of analysis difference: while motor control is concerned with lower-level articulatory control, psycholinguistics focuses on higher-level linguistic processing. However, closer examination of both approaches reveals a substantial convergence of ideas. The goal of this article is to integrate psycholinguistic and motor control approaches to speech production. The result of this synthesis is a neuroanatomically grounded hierarchical state feedback control model of speech production. PMID:22218206

  13. Speech Aids

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Designed to assist deaf and hearing impaired-persons in achieving better speech, Resnick Worldwide Inc.'s device provides a visual means of cuing the deaf as a speech-improvement measure. This is done by electronically processing the subjects' sounds and comparing them with optimum values which are displayed for comparison.

  14. Speech Communication.

    ERIC Educational Resources Information Center

    Brooks, William D.

    Presented in this book is a view of speech communication which enables an individual to become fully aware of his or her role as both initiator and recipient of messages. Communication is treated broadly with emphasis on the understanding and skills relating to various types of speech communication across the broad spectrum of human communication.…

  15. Symbolic Speech

    ERIC Educational Resources Information Center

    Podgor, Ellen S.

    1976-01-01

    The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)

  16. A Deep Ensemble Learning Method for Monaural Speech Separation

    PubMed Central

    Zhang, Xiao-Lei; Wang, DeLiang

    2016-01-01

    Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN)-based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences between the two optimization objectives are not well understood. In this paper, we propose a deep ensemble method, named multicontext networks, to address monaural speech separation. The first multicontext network averages the outputs of multiple DNNs whose inputs employ different window lengths. The second multicontext network is a stack of multiple DNNs. Each DNN in a module of the stack takes the concatenation of original acoustic features and expansion of the soft output of the lower module as its input, and predicts the ratio mask of the target speaker; the DNNs in the same module employ different contexts. We have conducted extensive experiments with three speech corpora. The results demonstrate the effectiveness of the proposed method. We have also compared the two optimization objectives systematically and found that predicting the ideal time-frequency mask is more efficient in utilizing clean training speech, while predicting clean speech is less sensitive to SNR variations. PMID:27917394

  17. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  18. Use of Computer Speech Technologies To Enhance Learning.

    ERIC Educational Resources Information Center

    Ferrell, Joe

    1999-01-01

    Discusses the design of an innovative learning system that uses new technologies for the man-machine interface, incorporating a combination of Automatic Speech Recognition (ASR) and Text To Speech (TTS) synthesis. Highlights include using speech technologies to mimic the attributes of the ideal tutor and design features. (AEF)

  19. Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences.

    PubMed

    Jin, Xin; Tecuapetla, Fatuel; Costa, Rui M

    2014-03-01

    Chunking allows the brain to efficiently organize memories and actions. Although basal ganglia circuits have been implicated in action chunking, little is known about how individual elements are concatenated into a behavioral sequence at the neural level. Using a task in which mice learned rapid action sequences, we uncovered neuronal activity encoding entire sequences as single actions in basal ganglia circuits. In addition to neurons with activity related to the start/stop activity signaling sequence parsing, we found neurons displaying inhibited or sustained activity throughout the execution of an entire sequence. This sustained activity covaried with the rate of execution of individual sequence elements, consistent with motor concatenation. Direct and indirect pathways of basal ganglia were concomitantly active during sequence initiation, but behaved differently during sequence performance, revealing a more complex functional organization of these circuits than previously postulated. These results have important implications for understanding the functional organization of basal ganglia during the learning and execution of action sequences.

  20. Performance analysis of a concatenated erbium-doped fiber amplifier supporting four mode groups

    NASA Astrophysics Data System (ADS)

    Qin, Zujun; Fan, Di; Zhang, Wentao; Xiong, Xianming

    2016-05-01

    An erbium-doped fiber amplifier (EDFA) supporting four mode groups has been theoretically designed by concatenating two sections of erbium-doped fibers (EDFs). Each EDF has a simple erbium doping profile for the purpose of reducing its fabrication complexity. We propose a modified genetic algorithm (GA) to provide detailed investigations on the concatenated amplifier. Both the optimal fiber length and erbium doping radius in each EDF have been found to minimize the gain difference between signal modes. Results show that the parameters of the central-doped EDF have a greater impact on the amplifier performance compared to those of the annular-doped one. We then investigate the influence of the small deviations of the erbium fiber length, doping radius and doping concentration of each EDF from their optimal values upon the amplifier performance, and discuss their design tolerances in obtaining a desirable amplification characteristics.

  1. Concatenated shift registers generating maximally spaced phase shifts of PN-sequences

    NASA Technical Reports Server (NTRS)

    Hurd, W. J.; Welch, L. R.

    1977-01-01

    A large class of linearly concatenated shift registers is shown to generate approximately maximally spaced phase shifts of pn-sequences, for use in pseudorandom number generation. A constructive method is presented for finding members of this class, for almost all degrees for which primitive trinomials exist. The sequences which result are not normally characterized by trinomial recursions, which is desirable since trinomial sequences can have some undesirable randomness properties.

  2. Vapor pressure measurements on low-volatility terpenoid compounds by the concatenated gas saturation method.

    PubMed

    Widegren, Jason A; Bruno, Thomas J

    2010-01-01

    The atmospheric oxidation of monoterpenes plays a central role in the formation of secondary organic aerosols (SOAs), which have important effects on the weather and climate. However, models of SOA formation have large uncertainties. One reason for this is that SOA formation depends directly on the vapor pressures of the monoterpene oxidation products, but few vapor pressures have been reported for these compounds. As a result, models of SOA formation have had to rely on estimated values of vapor pressure. To alleviate this problem, we have developed the concatenated gas saturation method, which is a simple, reliable, high-throughput method for measuring the vapor pressures of low-volatility compounds. The concatenated gas saturation method represents a significant advance over traditional gas saturation methods. Instead of a single saturator and trap, the concatenated method uses several pairs of saturators and traps linked in series. Consequently, several measurements of vapor pressure can be made simultaneously, which greatly increases the rate of data collection. It also allows for the simultaneous measurement of a control compound, which is important for ensuring data quality. In this paper we demonstrate the use of the concatenated gas saturation method by determination of the vapor pressures of five monoterpene oxidation products and n-tetradecane (the control compound) over the temperature range 283.15-313.15 K. Over this temperature range, the vapor pressures ranged from about 0.5 Pa to about 70 Pa. The standard molar enthalpies of vaporization or sublimation were determined by use of the Clausius-Clapeyron equation.

  3. Speech Problems

    MedlinePlus

    ... thinking, but it becomes disorganized as they're speaking. So, someone who clutters may speak in bursts ... refuse to wait patiently for them to finish speaking. If you have a speech problem, it's fine ...

  4. Coalescence vs. concatenation: Sophisticated analyses vs. first principles applied to rooting the angiosperms.

    PubMed

    Simmons, Mark P; Gatesy, John

    2015-10-01

    It has recently been concluded that phylogenomic data from 310 nuclear genes support the clade of (Amborellales, Nymphaeales) as sister to the remaining angiosperms and that shortcut coalescent phylogenetic methods outperformed concatenation for these data. We falsify both of those conclusions here by demonstrating that discrepant results between the coalescent and concatenation analyses are primarily caused by the coalescent methods applied (MP-EST and STAR) not being robust to the highly divergent and often mis-rooted gene trees that were used. This result reinforces the expectation that low amounts of phylogenetic signal and methodological artifacts in gene-tree reconstruction can be more problematic for shortcut coalescent methods than is the assumption of a single hierarchy for all genes by concatenation methods when these approaches are applied to ancient divergences in empirical studies. We also demonstrate that a third coalescent method, ASTRAL, is more robust to mis-rooted gene trees than MP-EST or STAR, and that both Observed Variability (OV) and Tree Independent Generation of Evolutionary Rates (TIGER), which are two character subsampling procedures, are biased in favor of characters with highly asymmetrical distributions of character states when applied to this dataset. We conclude that enthusiastic application of novel tools is not a substitute for rigorous application of first principles, and that trending methods (e.g., shortcut coalescent methods applied to ancient divergences, tree-independent character subsampling), may be novel sources of previously under-appreciated, systematic errors.

  5. Extension of the double-wave-vector diffusion-weighting experiment to multiple concatenations.

    PubMed

    Finsterbusch, Jürgen

    2009-06-01

    Experiments involving two diffusion-weightings in a single acquisition, so-called double- or two-wave-vector experiments, have recently been applied to measure the microscopic anisotropy in macroscopically isotropic samples or to estimate pore or compartment sizes. These informations are derived from the signal modulation observed when varying the wave vectors' orientations. However, the modulation amplitude can be small and, for short mixing times between the two diffusion-weightings, decays with increased gradient pulse lengths which hampers its detectability on whole-body MR systems. Here, an approach is investigated that involves multiple concatenations of the two diffusion-weightings in a single experiment. The theoretical framework for double-wave-vector experiments of fully restricted diffusion is adapted and the corresponding tensor approach recently presented for short mixing times extended and compared to numerical simulations. It is shown that for short mixing times (i) the extended tensor approach well describes the signal behavior observed for multiple concatenations and (ii) the relative amplitude of the signal modulation increases with the number of concatenations. Thus, the presented extension of the double-wave-vector experiment may help to improve the detectability of the signal modulations observed for short mixing times, in particular on whole-body MR systems with their limited gradient amplitudes.

  6. Listen up! Speech is for thinking during infancy.

    PubMed

    Vouloumanos, Athena; Waxman, Sandra R

    2014-12-01

    Infants' exposure to human speech within the first year promotes more than speech processing and language acquisition: new developmental evidence suggests that listening to speech shapes infants' fundamental cognitive and social capacities. Speech streamlines infants' learning, promotes the formation of object categories, signals communicative partners, highlights information in social interactions, and offers insight into the minds of others. These results, which challenge the claim that for infants, speech offers no special cognitive advantages, suggest a new synthesis. Far earlier than researchers had imagined, an intimate and powerful connection between human speech and cognition guides infant development, advancing infants' acquisition of fundamental psychological processes.

  7. A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model.

    PubMed

    Panchapagesan, Sankaran; Alwan, Abeer

    2011-04-01

    In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.

  8. Concatenated coding systems employing a unit-memory convolutional code and a byte-oriented decoding algorithm

    NASA Technical Reports Server (NTRS)

    Lee, L. N.

    1976-01-01

    Concatenated coding systems utilizing a convolutional code as the inner code and a Reed-Solomon code as the outer code are considered. In order to obtain very reliable communications over a very noisy channel with relatively small coding complexity, it is proposed to concatenate a byte oriented unit memory convolutional code with an RS outer code whose symbol size is one byte. It is further proposed to utilize a real time minimal byte error probability decoding algorithm, together with feedback from the outer decoder, in the decoder for the inner convolutional code. The performance of the proposed concatenated coding system is studied, and the improvement over conventional concatenated systems due to each additional feature is isolated.

  9. Free Speech Yearbook: 1972.

    ERIC Educational Resources Information Center

    Tedford, Thomas L., Ed.

    This book is a collection of essays on free speech issues and attitudes, compiled by the Commission on Freedom of Speech of the Speech Communication Association. Four articles focus on freedom of speech in classroom situations as follows: a philosophic view of teaching free speech, effects of a course on free speech on student attitudes,…

  10. Speech Research

    NASA Astrophysics Data System (ADS)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  11. Speech analyzer

    NASA Technical Reports Server (NTRS)

    Lokerson, D. C. (Inventor)

    1977-01-01

    A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.

  12. Motor modules of human locomotion: influence of EMG averaging, concatenation, and number of step cycles

    PubMed Central

    Oliveira, Anderson S.; Gizzi, Leonardo; Farina, Dario; Kersting, Uwe G.

    2014-01-01

    Locomotion can be investigated by factorization of electromyographic (EMG) signals, e.g., with non-negative matrix factorization (NMF). This approach is a convenient concise representation of muscle activities as distributed in motor modules, activated in specific gait phases. For applying NMF, the EMG signals are analyzed either as single trials, or as averaged EMG, or as concatenated EMG (data structure). The aim of this study is to investigate the influence of the data structure on the extracted motor modules. Twelve healthy men walked at their preferred speed on a treadmill while surface EMG signals were recorded for 60s from 10 lower limb muscles. Motor modules representing relative weightings of synergistic muscle activations were extracted by NMF from 40 step cycles separately (EMGSNG), from averaging 2, 3, 5, 10, 20, and 40 consecutive cycles (EMGAVR), and from the concatenation of the same sets of consecutive cycles (EMGCNC). Five motor modules were sufficient to reconstruct the original EMG datasets (reconstruction quality >90%), regardless of the type of data structure used. However, EMGCNC was associated with a slightly reduced reconstruction quality with respect to EMGAVR. Most motor modules were similar when extracted from different data structures (similarity >0.85). However, the quality of the reconstructed 40-step EMGCNC datasets when using the muscle weightings from EMGAVR was low (reconstruction quality ~40%). On the other hand, the use of weightings from EMGCNC for reconstructing this long period of locomotion provided higher quality, especially using 20 concatenated steps (reconstruction quality ~80%). Although EMGSNG and EMGAVR showed a higher reconstruction quality for short signal intervals, these data structures did not account for step-to-step variability. The results of this study provide practical guidelines on the methodological aspects of synergistic muscle activation extraction from EMG during locomotion. PMID:24904375

  13. Space communication system for compressed data with a concatenated Reed-Solomon-Viterbi coding channel

    NASA Technical Reports Server (NTRS)

    Rice, R. F.; Hilbert, E. E. (Inventor)

    1976-01-01

    A space communication system incorporating a concatenated Reed Solomon Viterbi coding channel is discussed for transmitting compressed and uncompressed data from a spacecraft to a data processing center on Earth. Imaging (and other) data are first compressed into source blocks which are then coded by a Reed Solomon coder and interleaver, followed by a convolutional encoder. The received data is first decoded by a Viterbi decoder, followed by a Reed Solomon decoder and deinterleaver. The output of the latter is then decompressed, based on the compression criteria used in compressing the data in the spacecraft. The decompressed data is processed to reconstruct an approximation of the original data-producing condition or images.

  14. Generation of concatenated Greenberger-Horne-Zeilinger-type entangled coherent state based on linear optics

    NASA Astrophysics Data System (ADS)

    Guo, Rui; Zhou, Lan; Gu, Shi-Pu; Wang, Xing-Fu; Sheng, Yu-Bo

    2017-03-01

    The concatenated Greenberger-Horne-Zeilinger (C-GHZ) state is a new type of multipartite entangled state, which has potential application in future quantum information. In this paper, we propose a protocol of constructing arbitrary C-GHZ entangled state approximatively. Different from previous protocols, each logic qubit is encoded in the coherent state. This protocol is based on the linear optics, which is feasible in experimental technology. This protocol may be useful in quantum information based on the C-GHZ state.

  15. Investigation of the Use of Erasures in a Concatenated Coding Scheme

    NASA Technical Reports Server (NTRS)

    Kwatra, S. C.; Marriott, Philip J.

    1997-01-01

    A new method for declaring erasures in a concatenated coding scheme is investigated. This method is used with the rate 1/2 K = 7 convolutional code and the (255, 223) Reed Solomon code. Errors and erasures Reed Solomon decoding is used. The erasure method proposed uses a soft output Viterbi algorithm and information provided by decoded Reed Solomon codewords in a deinterleaving frame. The results show that a gain of 0.3 dB is possible using a minimum amount of decoding trials.

  16. Type of Speech Material Affects Acceptable Noise Level Test Outcome

    PubMed Central

    Koch, Xaver; Dingemanse, Gertjan; Goedegebure, André; Janse, Esther

    2016-01-01

    The acceptable noise level (ANL) test, in which individuals indicate what level of noise they are willing to put up with while following speech, has been used to guide hearing aid fitting decisions and has been found to relate to prospective hearing aid use. Unlike objective measures of speech perception ability, ANL outcome is not related to individual hearing loss or age, but rather reflects an individual’s inherent acceptance of competing noise while listening to speech. As such, the measure may predict aspects of hearing aid success. Crucially, however, recent studies have questioned its repeatability (test–retest reliability). The first question for this study was whether the inconsistent results regarding the repeatability of the ANL test may be due to differences in speech material types used in previous studies. Second, it is unclear whether meaningfulness and semantic coherence of the speech modify ANL outcome. To investigate these questions, we compared ANLs obtained with three types of materials: the International Speech Test Signal (ISTS), which is non-meaningful and semantically non-coherent by definition, passages consisting of concatenated meaningful standard audiology sentences, and longer fragments taken from conversational speech. We included conversational speech as this type of speech material is most representative of everyday listening. Additionally, we investigated whether ANL outcomes, obtained with these three different speech materials, were associated with self-reported limitations due to hearing problems and listening effort in everyday life, as assessed by a questionnaire. ANL data were collected for 57 relatively good-hearing adult participants with an age range representative for hearing aid users. Results showed that meaningfulness, but not semantic coherence of the speech material affected ANL. Less noise was accepted for the non-meaningful ISTS signal than for the meaningful speech materials. ANL repeatability was comparable

  17. Speech Intelligibility

    NASA Astrophysics Data System (ADS)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  18. Keynote Speeches.

    ERIC Educational Resources Information Center

    2000

    This document contains the six of the seven keynote speeches from an international conference on vocational education and training (VET) for lifelong learning in the information era. "IVETA (International Vocational Education and Training Association) 2000 Conference 6-9 August 2000" (K.Y. Yeung) discusses the objectives and activities…

  19. Phrase-level speech simulation with an airway modulation model of speech production

    PubMed Central

    Story, Brad H.

    2012-01-01

    Artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated. PMID:23503742

  20. Speech production knowledge in automatic speech recognition.

    PubMed

    King, Simon; Frankel, Joe; Livescu, Karen; McDermott, Erik; Richmond, Korin; Wester, Mirjam

    2007-02-01

    Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, a survey of a growing body of work in which such representations are used to improve automatic speech recognition is provided.

  1. Static and Dynamic Features for Improved HMM based Visual Speech Recognition

    NASA Astrophysics Data System (ADS)

    Rajavel, R.; Sathidevi, P. S.

    Visual speech recognition refers to the identification of utterances through the movements of lips, tongue, teeth, and other facial muscles of the speaker without using the acoustic signal. This work shows the relative benefits of both static and dynamic visual speech features for improved visual speech recognition. Two approaches for visual feature extraction have been considered: (1) an image transform based static feature approach in which Discrete Cosine Transform (DCT) is applied to each video frame and 6×6 triangle region coefficients are considered as features. Principal Component Analysis (PCA) is applied over all 60 features corresponding to the video frame to reduce the redundancy; the resultant 21 coefficients are taken as the static visual features. (2) Motion segmentation based dynamic feature approach in which the facial movements are segmented from the video file using motion history images (MHI). DCT is applied to the MHI and triangle region coefficients are taken as the dynamic visual features. Two types of experiments were done one with concatenated features and another with dimension reduced feature by using PCA to identify the utterances. The left-right continuous HMMs are used as visual speech classifier to classify nine MPEG-4 standard viseme consonants. The experimental result shows that the concatenated as well as dimension reduced features improve te visual speech recognition with a high accuracy of 92.45% and 92.15% respectively.

  2. Typical l1-recovery limit of sparse vectors represented by concatenations of random orthogonal matrices

    NASA Astrophysics Data System (ADS)

    Kabashima, Yoshiyuki; Vehkaperä, Mikko; Chatterjee, Saikat

    2012-12-01

    We consider the problem of recovering an N-dimensional sparse vector x from its linear transformation y = Dx of M (concatenating T = N/M matrices O1,O2,…,OT drawn uniformly according to the Haar measure on the M × M orthogonal matrices. By using the replica method in conjunction with the development of an integral formula to handle the random orthogonal matrices, we show that the concatenated matrices can result in better recovery performance than that predicted by the universality when the density of non-zero signals is not uniform among the T matrix modules. The universal condition is reproduced for the special case of uniform non-zero signal densities. Extensive numerical experiments support the theoretical predictions.

  3. Concatenation of ‘alert’ and ‘identity’ segments in dingoes’ alarm calls

    PubMed Central

    Déaux, Eloïse C.; Allen, Andrew P.; Clarke, Jennifer A.; Charrier, Isabelle

    2016-01-01

    Multicomponent signals can be formed by the uninterrupted concatenation of multiple call types. One such signal is found in dingoes, Canis familiaris dingo. This stereotyped, multicomponent ‘bark-howl’ vocalisation is formed by the concatenation of a noisy bark segment and a tonal howl segment. Both segments are structurally similar to bark and howl vocalisations produced independently in other contexts (e.g. intra- and inter-pack communication). Bark-howls are mainly uttered in response to human presence and were hypothesized to serve as alarm calls. We investigated the function of bark-howls and the respective roles of the bark and howl segments. We found that dingoes could discriminate between familiar and unfamiliar howl segments, after having only heard familiar howl vocalisations (i.e. different calls). We propose that howl segments could function as ‘identity signals’ and allow receivers to modulate their responses according to the caller’s characteristics. The bark segment increased receivers’ attention levels, providing support for earlier observational claims that barks have an ‘alerting’ function. Lastly, dingoes were more likely to display vigilance behaviours upon hearing bark-howl vocalisations, lending support to the alarm function hypothesis. Canid vocalisations, such as the dingo bark-howl, may provide a model system to investigate the selective pressures shaping complex communication systems. PMID:27460289

  4. Speech communications in noise

    NASA Technical Reports Server (NTRS)

    1984-01-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  5. Speech communications in noise

    NASA Astrophysics Data System (ADS)

    1984-07-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  6. Stress versus coarticulation: toward an integrated approach to explicit speech segmentation.

    PubMed

    Mattys, Sven L

    2004-04-01

    Although word stress has been hailed as a powerful speech-segmentation cue, the results of 5 cross-modal fragment priming experiments revealed limitations to stress-based segmentation. Specifically, the stress pattern of auditory primes failed to have any effect on the lexical decision latencies to related visual targets. A determining factor was whether the onset of the prime was coarticulated with the preceding speech fragment. Uncoarticulated (i.e., concatenated) primes facilitated priming. Coarticulated ones did not. However, when the primes were presented in a background of noise, the pattern of results reversed, and a strong stress effect emerged: Stress-initial primes caused more pruning than non-initial-stress primes, regardless of the coarticulatory cues. The results underscore the role of coarticulation in the segmentation of clear speech and that of stress in impoverished listening conditions. More generally, they call for an integrated and signal-contingent approach to speech segmentation.

  7. The effects of receiver tracking phase error on the performance of the concatenated Reed-Solomon/Viterbi channel coding system

    NASA Technical Reports Server (NTRS)

    Liu, K. Y.

    1981-01-01

    Analytical and experimental results are presented of the effects of receiver tracking phase error, caused by weak signal conditions on either the uplink or the downlink or both, on the performance of the concatenated Reed-Solomon (RS) Viterbi channel coding system. The test results were obtained under an emulated S band uplink and X band downlink, two way space communication channel in the telecommunication development laboratory of JPL with data rates ranging from 4 kHz to 20 kHz. It is shown that, with ideal interleaving, the concatenated RS/Viterbi coding system is capable of yielding large coding gains at very low bit error probabilities over the Viterbi decoded convolutional only coding system. Results on the effects of receiver tracking phase errors on the performance of the concatenated coding system with antenna array combining are included.

  8. Speech Research.

    DTIC Science & Technology

    1979-12-31

    Academic Press, 1973. Kimura, D. The neural basis of language qua gesture. In H. Whitaker & H. A. Whitaker (Eds.), Studies in neurolinguistics (Vol. 3...Lubker, J., & Gay, T. Formant frequencies of some fixed- mandible vowels and a model of speech motor programming . Journal of Phonetics, 1979, 7, 147-162...A. Interarticulator programming in stop production. To appear in Journal of Phonetics, in press. Ldfqvist, A., & Yoshioka, H. Laryngeal activity in

  9. Sample-based engine noise synthesis using an enhanced pitch-synchronous overlap-and-add method.

    PubMed

    Jagla, Jan; Maillard, Julien; Martin, Nadine

    2012-11-01

    An algorithm for the real time synthesis of internal combustion engine noise is presented. Through the analysis of a recorded engine noise signal of continuously varying engine speed, a dataset of sound samples is extracted allowing the real time synthesis of the noise induced by arbitrary evolutions of engine speed. The sound samples are extracted from a recording spanning the entire engine speed range. Each sample is delimitated such as to contain the sound emitted during one cycle of the engine plus the necessary overlap to ensure smooth transitions during the synthesis. The proposed approach, an extension of the PSOLA method introduced for speech processing, takes advantage of the specific periodicity of engine noise signals to locate the extraction instants of the sound samples. During the synthesis stage, the sound samples corresponding to the target engine speed evolution are concatenated with an overlap and add algorithm. It is shown that this method produces high quality audio restitution with a low computational load. It is therefore well suited for real time applications.

  10. The Effect of Three Variables on Synthetic Speech Intelligibility in Noisy Environments

    DTIC Science & Technology

    1990-03-01

    this study: the analog formant frequency synthesis technique. A second definition of "synthetic" speech is related to basic data sampling theory...Analog formant frequency synthesis is a typical synthetic speech methodology, used here as an illustration of the technique. The waveform encoding and...reconstruction technique (discussed above) is similar to a "photograph" of speech. Analog formant frequency synthesis is more like an artist’s

  11. Self-Similar Conformations and Dynamics of Non-Concatenated Entangled Ring Polymers

    NASA Astrophysics Data System (ADS)

    Ge, Ting

    A scaling model of self-similar conformations and dynamics of non-concatenated entangled ring polymers is developed. Topological constraints force these ring polymers into compact conformations with fractal dimension D =3 that we call fractal loopy globules (FLGs). This result is based on the conjecture that the overlap parameter of loops on all length scales is equal to the Kavassalis-Noolandi number 10-20. The dynamics of entangled rings is self-similar, and proceeds as loops of increasing sizes are rearranged progressively at their respective diffusion times. The topological constraints associated with smaller rearranged loops affect the dynamics of larger loops by increasing the effective friction coefficient, but have no influence on the tubes confining larger loops. Therefore, the tube diameter defined as the average spacing between relevant topological constraints increases with time, leading to ``tube dilation''. Analysis of the primitive paths in molecular dynamics (MD) simulations suggests complete tube dilation with the tube diameter on the order of the time-dependent characteristic loop size. A characteristic loop at time t is defined as a ring section that has diffused a distance of its size during time t. We derive dynamic scaling exponents in terms of fractal dimensions of an entangled ring and the underlying primitive path and a parameter characterizing the extent of tube dilation. The results reproduce the predictions of different dynamic models of a single non-concatenated entangled ring. We demonstrate that traditional generalization of single-ring models to multi-ring dynamics is not self-consistent and develop a FLG model with self-consistent multi-ring dynamics and complete tube dilation. Various dynamic scaling exponents predicted by the self-consistent FLG model are consistent with recent computer simulations and experiments. We also perform MD simulations of nanoparticle (NP) diffusion in melts of non-concatenated entangled ring polymers

  12. Speech enhancement via two-stage dual tree complex wavelet packet transform with a speech presence probability estimator.

    PubMed

    Sun, Pengfei; Qin, Jun

    2017-02-01

    In this paper, a two-stage dual tree complex wavelet packet transform (DTCWPT) based speech enhancement algorithm has been proposed, in which a speech presence probability (SPP) estimator and a generalized minimum mean squared error (MMSE) estimator are developed. To overcome the drawback of signal distortions caused by down sampling of wavelet packet transform (WPT), a two-stage analytic decomposition concatenating undecimated wavelet packet transform (UWPT) and decimated WPT is employed. An SPP estimator in the DTCWPT domain is derived based on a generalized Gamma distribution of speech, and Gaussian noise assumption. The validation results show that the proposed algorithm can obtain enhanced perceptual evaluation of speech quality (PESQ), and segmental signal-to-noise ratio (SegSNR) at low signal-to-noise ratio (SNR) nonstationary noise, compared with four other state-of-the-art speech enhancement algorithms, including optimally modified log-spectral amplitude (OM-LSA), soft masking using a posteriori SNR uncertainty (SMPO), a posteriori SPP based MMSE estimation (MMSE-SPP), and adaptive Bayesian wavelet thresholding (BWT).

  13. BoD services in layer 1 VPN with dynamic virtual concatenation group

    NASA Astrophysics Data System (ADS)

    Du, Shu; Peng, Yunfeng; Long, Keping

    2008-11-01

    Bandwidth-on-Demand (BoD) services are characteristic of dynamic bandwidth provisioning based on customers' resource requirement, which will be a must for future networks. BoD services become possible with the development of make-before-break, Virtual Concatenation (VCAT) and Link Capacity Adjustment Scheme (LCAS). In this paper, we introduce BoD services into L1VPN, thus the resource assigned to a L1VPN can be gracefully adjusted at various bandwidth granularities based on customers' requirement. And we propose a dynamic bandwidth adjustment scheme, which is compromise between make-before-break and VCAT&LCAS and mainly based on the latter. The scheme minimizes the number of distinct paths to support a connection between a source-destination pair, and uses make-beforebreak technology for re-optimization.

  14. Generation of an arbitrary concatenated Greenberger-Horne-Zeilinger state with single photons

    NASA Astrophysics Data System (ADS)

    Chen, Shan-Shan; Zhou, Lan; Sheng, Yu-Bo

    2017-02-01

    The concatenated Greenberger-Horne-Zeilinger (C-GHZ) state is a new kind of logic-qubit entangled state, which may have extensive applications in future quantum communication. In this letter, we propose a protocol for constructing an arbitrary C-GHZ state with single photons. We exploit the cross-Kerr nonlinearity for this purpose. This protocol has some advantages over previous protocols. First, it only requires two kinds of cross-Kerr nonlinearities to generate single phase shifts  ±θ. Second, it is not necessary to use sophisticated m-photon Toffoli gates. Third, this protocol is deterministic and can be used to generate an arbitrary C-GHZ state. This protocol may be useful in future quantum information processing based on the C-GHZ state.

  15. Multidimensional Trellis Coded Phase Modulation Using a Multilevel Concatenation Approach. Part 1; Code Design

    NASA Technical Reports Server (NTRS)

    Rajpal, Sandeep; Rhee, Do Jun; Lin, Shu

    1997-01-01

    The first part of this paper presents a simple and systematic technique for constructing multidimensional M-ary phase shift keying (MMK) trellis coded modulation (TCM) codes. The construction is based on a multilevel concatenation approach in which binary convolutional codes with good free branch distances are used as the outer codes and block MPSK modulation codes are used as the inner codes (or the signal spaces). Conditions on phase invariance of these codes are derived and a multistage decoding scheme for these codes is proposed. The proposed technique can be used to construct good codes for both the additive white Gaussian noise (AWGN) and fading channels as is shown in the second part of this paper.

  16. Hyperbranched Hybridization Chain Reaction for Triggered Signal Amplification and Concatenated Logic Circuits.

    PubMed

    Bi, Sai; Chen, Min; Jia, Xiaoqiang; Dong, Ying; Wang, Zonghua

    2015-07-06

    A hyper-branched hybridization chain reaction (HB-HCR) is presented herein, which consists of only six species that can metastably coexist until the introduction of an initiator DNA to trigger a cascade of hybridization events, leading to the self-sustained assembly of hyper-branched and nicked double-stranded DNA structures. The system can readily achieve ultrasensitive detection of target DNA. Moreover, the HB-HCR principle is successfully applied to construct three-input concatenated logic circuits with excellent specificity and extended to design a security-mimicking keypad lock system. Significantly, the HB-HCR-based keypad lock can alarm immediately if the "password" is incorrect. Overall, the proposed HB-HCR with high amplification efficiency is simple, homogeneous, fast, robust, and low-cost, and holds great promise in the development of biosensing, in the programmable assembly of DNA architectures, and in molecular logic operations.

  17. Inter-Calibration and Concatenation of Climate Quality Infrared Cloudy Radiances from Multiple Instruments

    NASA Technical Reports Server (NTRS)

    Behrangi, Ali; Aumann, Hartmut H.

    2013-01-01

    A change in climate is not likely captured from any single instrument, since no single instrument can span decades of time. Therefore, to detect signals of global climate change, observations from many instruments on different platforms have to be concatenated. This requires careful and detailed consideration of instrumental differences such as footprint size, diurnal cycle of observations, and relative biases in the spectral brightness temperatures. Furthermore, a common basic assumption is that the data quality is independent of the observed scene and therefore can be determined using clear scene data. However, as will be demonstrated, this is not necessarily a valid assumption as the globe is mostly cloudy. In this study we highlight challenges in inter-calibration and concatenation of infrared radiances from multiple instruments by focusing on the analysis of deep convective or anvil clouds. TRMM/VIRS is potentially useful instrument to make correction for observational differences in the local time and foot print sizes, and thus could be applied retroactively to vintage instruments such as AIRS, IASI, IRIS, AVHRR, and HIRS. As the first step, in this study, we investigate and discuss to what extent AIRS and VIRS agree in capturing deep cloudy radiances at the same local time. The analysis also includes comparisons with one year observations from CrIS. It was found that the instruments show calibration differences of about 1K under deep cloudy scenes that can vary as a function of land type and local time of observation. The sensitivity of footprint size, view angle, and spectral band-pass differences cannot fully explain the observed differences. The observed discrepancies can be considered as a measure of the magnitude of issues which will arise in the comparison of legacy data with current data.

  18. Speech-to-Speech Relay Service

    MedlinePlus

    ... Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that ... to STS, go to www. fcc. gov/ guides/ telecommunications- relay- service- trs. Filing a Complaint If you ...

  19. Speech research

    NASA Astrophysics Data System (ADS)

    1992-06-01

    Phonology is traditionally seen as the discipline that concerns itself with the building blocks of linguistic messages. It is the study of the structure of sound inventories of languages and of the participation of sounds in rules or processes. Phonetics, in contrast, concerns speech sounds as produced and perceived. Two extreme positions on the relationship between phonological messages and phonetic realizations are represented in the literature. One holds that the primary home for linguistic symbols, including phonological ones, is the human mind, itself housed in the human brain. The second holds that their primary home is the human vocal tract.

  20. Insufficient Chunk Concatenation May Underlie Changes in Sleep-Dependent Consolidation of Motor Sequence Learning in Older Adults

    ERIC Educational Resources Information Center

    Bottary, Ryan; Sonni, Akshata; Wright, David; Spencer, Rebecca M. C.

    2016-01-01

    Sleep enhances motor sequence learning (MSL) in young adults by concatenating subsequences ("chunks") formed during skill acquisition. To examine whether this process is reduced in aging, we assessed performance changes on the MSL task following overnight sleep or daytime wake in healthy young and older adults. Young adult performance…

  1. Research on Speech Perception. Progress Report No. 12.

    ERIC Educational Resources Information Center

    Pisoni, David B.; And Others

    Summarizing research activities in 1986, this is the twelfth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report contains the following 23 articles: "Comprehension of Digitally Encoded Natural Speech…

  2. Research on Speech Perception. Progress Report No. 13.

    ERIC Educational Resources Information Center

    Pisoni, David B.; And Others

    Summarizing research activities in 1987, this is the thirteenth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, progress reports, and information on…

  3. Research on Speech Perception. Progress Report No. 15.

    ERIC Educational Resources Information Center

    Pisoni, David B.

    Summarizing research activities in 1989, this is the fifteenth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report contains the following 21 articles: "Perceptual Learning of Nonnative Speech…

  4. Research on Speech Perception. Progress Report No. 14.

    ERIC Educational Resources Information Center

    Pisoni, David B.; And Others

    Summarizing research activities in 1988, this is the fourteenth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, and progress reports. The report contains…

  5. Analysis of a Digital Technique for Frequency Transposition of Speech.

    DTIC Science & Technology

    1985-09-01

    31 5. Spectral Analysis ------------------------ 36 ZI a. Formant Frequencies ------------------ 36 j 5 .-h C. SPEECH SYNTHESIS...necessary to begin the generation of sound. The vocal cords, tongue , mouth, lips and nasal tract combine their different properties to shape the airflow...convenient way to portray the frequency content of speech is through the determination of formant frequencies. Formant frequencies are the most prominent

  6. Chunk concatenation evolves with practice and sleep-related enhancement consolidation in a complex arm movement sequence

    PubMed Central

    Malangré, Andreas

    2016-01-01

    Abstract This paper addresses the notion of chunk concatenation being associated with sleep-related enhancement consolidation of motor sequence memory, thereby essentially contributing to improvements in sequence execution speed. To this end, element movement times of a multi-joint arm movement sequence incorporated in a recent study by Malangré et al. (2014) were reanalyzed. As sequence elements differed with respect to movement distance, element movement times had to be purged from differences solely due to varying trajectory lengths. This was done by dividing each element movement time per subject and trial block by the respective “reference movement time” collected from subjects who had extensively practiced each sequence element in isolation. Any differences in these “relative element movement times” were supposed to reflect element-specific “production costs” imposed solely by the sequence context. Across all subjects non-idiosyncratic, lasting sequence segmentation was shown, and four possible concatenation points (i.e. transition points between successive chunks) within the original arm movement sequence were identified. Based on theoretical suppositions derived from previous work with the discrete sequence production task and the dual processor model (Abrahamse et al., 2013), significantly larger improvements in transition speed occurring at these four concatenation points as compared to the five fastest transition positions within the sequence (associated with mere element execution) were assumed to indicate increased chunk concatenation. As a result, chunk concatenation was shown to proceed during acquisition with physical practice, and, most importantly, to significantly progress some more during retention following a night of sleep, but not during a waking interval. PMID:28149363

  7. 78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-15

    ... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities AGENCY: Federal Communications Commission. ACTION: Proposed rule....

  8. Speech therapy with obturator.

    PubMed

    Shyammohan, A; Sreenivasulu, D

    2010-12-01

    Rehabilitation of speech is tantamount to closure of defect in cases with velopharyngeal insufficiency. Often the importance of speech therapy is sidelined during the fabrication of obturators. Usually the speech part is taken up only at a later stage and is relegated entirely to a speech therapist without the active involvement of the prosthodontist. The article suggests a protocol for speech therapy in such cases to be done in unison with a prosthodontist.

  9. Segmental concatenation of individual signatures and context cues in banded mongoose (Mungos mungo) close calls

    PubMed Central

    2012-01-01

    Background All animals are anatomically constrained in the number of discrete call types they can produce. Recent studies suggest that by combining existing calls into meaningful sequences, animals can increase the information content of their vocal repertoire despite these constraints. Additionally, signalers can use vocal signatures or cues correlated to other individual traits or contexts to increase the information encoded in their vocalizations. However, encoding multiple vocal signatures or cues using the same components of vocalizations usually reduces the signals' reliability. Segregation of information could effectively circumvent this trade-off. In this study we investigate how banded mongooses (Mungos mungo) encode multiple vocal signatures or cues in their frequently emitted graded single syllable close calls. Results The data for this study were collected on a wild, but habituated, population of banded mongooses. Using behavioral observations and acoustical analysis we found that close calls contain two acoustically different segments. The first being stable and individually distinct, and the second being graded and correlating with the current behavior of the individual, whether it is digging, searching or moving. This provides evidence of Marler's hypothesis on temporal segregation of information within a single syllable call type. Additionally, our work represents an example of an identity cue integrated as a discrete segment within a single call that is independent from context. This likely functions to avoid ambiguity between individuals or receivers having to keep track of several context-specific identity cues. Conclusions Our study provides the first evidence of segmental concatenation of information within a single syllable in non-human vocalizations. By reviewing descriptions of call structures in the literature, we suggest a general application of this mechanism. Our study indicates that temporal segregation and segmental concatenation of

  10. A commercial large-vocabulary discrete speech recognition system: DragonDictate.

    PubMed

    Mandel, M A

    1992-01-01

    DragonDictate is currently the only commercially available general-purpose, large-vocabulary speech recognition system. It uses discrete speech and is speaker-dependent, adapting to the speaker's voice and language model with every word. Its acoustic adaptability is based in a three-level phonology and a stochastic model of production. The phonological levels are phonemes, augmented triphones (phonemes-in-context or PICs), and steady-state spectral slices that are concatenated to approximate the spectra of these PICs (phonetic elements or PELs) and thus of words. Production is treated as a hidden Markov process, which the recognizer has to identify from its output, the spoken word. Findings of practical value to speech recognition are presented from research on six European languages.

  11. On the undetected error probability of a concatenated coding scheme for error control

    NASA Technical Reports Server (NTRS)

    Deng, H.; Costello, D. J., Jr.

    1984-01-01

    Consider a concatenated coding scheme for error control on a binary symmetric channel, called the inner channel. The bit error rate (BER) of the channel is correspondingly called the inner BER, and is denoted by Epsilon (sub i). Two linear block codes, C(sub f) and C(sub b), are used. The inner code C(sub f), called the frame code, is an (n,k) systematic binary block code with minimum distance, d(sub f). The frame code is designed to correct + or fewer errors and simultaneously detect gamma (gamma +) or fewer errors, where + + gamma + 1 = to or d(sub f). The outer code C(sub b) is either an (n(sub b), K(sub b)) binary block with a n(sub b) = mk, or an (n(sub b), k(Sub b) maximum distance separable (MDS) code with symbols from GF(q), where q = 2(b) and the code length n(sub b) satisfies n(sub)(b) = mk. The integerim is the number of frames. The outercode is designed for error detection only.

  12. A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading

    PubMed Central

    Price, Cathy J.

    2012-01-01

    The anatomy of language has been investigated with PET or fMRI for more than 20 years. Here I attempt to provide an overview of the brain areas associated with heard speech, speech production and reading. The conclusions of many hundreds of studies were considered, grouped according to the type of processing, and reported in the order that they were published. Many findings have been replicated time and time again leading to some consistent and undisputable conclusions. These are summarised in an anatomical model that indicates the location of the language areas and the most consistent functions that have been assigned to them. The implications for cognitive models of language processing are also considered. In particular, a distinction can be made between processes that are localized to specific structures (e.g. sensory and motor processing) and processes where specialisation arises in the distributed pattern of activation over many different areas that each participate in multiple functions. For example, phonological processing of heard speech is supported by the functional integration of auditory processing and articulation; and orthographic processing is supported by the functional integration of visual processing, articulation and semantics. Future studies will undoubtedly be able to improve the spatial precision with which functional regions can be dissociated but the greatest challenge will be to understand how different brain regions interact with one another in their attempts to comprehend and produce language. PMID:22584224

  13. Children's perception of their synthetically corrected speech production.

    PubMed

    Strömbergsson, Sofia; Wengelin, Asa; House, David

    2014-06-01

    We explore children's perception of their own speech - in its online form, in its recorded form, and in synthetically modified forms. Children with phonological disorder (PD) and children with typical speech and language development (TD) performed tasks of evaluating accuracy of the different types of speech stimuli, either immediately after having produced the utterance or after a delay. In addition, they performed a task designed to assess their ability to detect synthetic modification. Both groups showed high performance in tasks involving evaluation of other children's speech, whereas in tasks of evaluating one's own speech, the children with PD were less accurate than their TD peers. The children with PD were less sensitive to misproductions in immediate conjunction with their production of an utterance, and more accurate after a delay. Within-category modification often passed undetected, indicating a satisfactory quality of the generated speech. Potential clinical benefits of using corrective re-synthesis are discussed.

  14. Expansion and concatenation of nonmuscle myosin IIA filaments drive cellular contractile system formation during interphase and mitosis

    PubMed Central

    Fenix, Aidan M.; Taneja, Nilay; Buttler, Carmen A.; Lewis, John; Van Engelenburg, Schuyler B.; Ohi, Ryoma; Burnette, Dylan T.

    2016-01-01

    Cell movement and cytokinesis are facilitated by contractile forces generated by the molecular motor, nonmuscle myosin II (NMII). NMII molecules form a filament (NMII-F) through interactions of their C-terminal rod domains, positioning groups of N-terminal motor domains on opposite sides. The NMII motors then bind and pull actin filaments toward the NMII-F, thus driving contraction. Inside of crawling cells, NMIIA-Fs form large macromolecular ensembles (i.e., NMIIA-F stacks), but how this occurs is unknown. Here we show NMIIA-F stacks are formed through two non–mutually exclusive mechanisms: expansion and concatenation. During expansion, NMIIA molecules within the NMIIA-F spread out concurrent with addition of new NMIIA molecules. Concatenation occurs when multiple NMIIA-Fs/NMIIA-F stacks move together and align. We found that NMIIA-F stack formation was regulated by both motor activity and the availability of surrounding actin filaments. Furthermore, our data showed expansion and concatenation also formed the contractile ring in dividing cells. Thus interphase and mitotic cells share similar mechanisms for creating large contractile units, and these are likely to underlie how other myosin II–based contractile systems are assembled. PMID:26960797

  15. Data concatenation, Bayesian concordance and coalescent-based analyses of the species tree for the rapid radiation of Triturus newts.

    PubMed

    Wielstra, Ben; Arntzen, Jan W; van der Gaag, Kristiaan J; Pabijan, Maciej; Babik, Wieslaw

    2014-01-01

    The phylogenetic relationships for rapid species radiations are difficult to disentangle. Here we study one such case, namely the genus Triturus, which is composed of the marbled and crested newts. We analyze data for 38 genetic markers, positioned in 3-prime untranslated regions of protein-coding genes, obtained with 454 sequencing. Our dataset includes twenty Triturus newts and represents all nine species. Bayesian analysis of population structure allocates all individuals to their respective species. The branching patterns obtained by data concatenation, Bayesian concordance analysis and coalescent-based estimations of the species tree differ from one another. The data concatenation based species tree shows high branch support but branching order is considerably affected by allele choice in the case of heterozygotes in the concatenation process. Bayesian concordance analysis expresses the conflict between individual gene trees for part of the Triturus species tree as low concordance factors. The coalescent-based species tree is relatively similar to a previously published species tree based upon morphology and full mtDNA and any conflicting internal branches are not highly supported. Our findings reflect high gene tree discordance due to incomplete lineage sorting (possibly aggravated by hybridization) in combination with low information content of the markers employed (as can be expected for relatively recent species radiations). This case study highlights the complexity of resolving rapid radiations and we acknowledge that to convincingly resolve the Triturus species tree even more genes will have to be consulted.

  16. High-speed concatenation of frequency ramps using sampled grating distributed Bragg reflector laser diode sources for OCT resolution enhancement

    NASA Astrophysics Data System (ADS)

    George, Brandon; Derickson, Dennis

    2010-02-01

    Wavelength tunable sampled grating distributed Bragg reflector (SG-DBR) lasers used for telecommunications applications have previously demonstrated the ability for linear frequency ramps covering the entire tuning range of the laser at 100 kHz repetition rates1. An individual SG-DBR laser has a typical tuning range of 50 nm. The InGaAs/InP material system often used with SG-DBR lasers allows for design variations that cover the 1250 to 1650 nm wavelength range. This paper addresses the possibility of concatenating the outputs of tunable SGDBR lasers covering adjacent wavelength ranges for enhancing the resolution of OCT measurements. This laser concatenation method is demonstrated by combining the 1525 nm to 1575 nm wavelength range of a "C Band" SG-DBR laser with the 1570nm to 1620 nm wavelength coverage of an "L-Band" SG-DBR laser. Measurements show that SGDBR lasers can be concatenated with a transition switching time of less than 50 ns with undesired leakage signals attenuated by 50 dB.

  17. Speech and Language Impairments

    MedlinePlus

    ... is…Robbie, Pearl, and Mario. Back to top Definition There are many kinds of speech and language ... education available to school-aged children with disabilities. Definition of “Speech or Language Impairment” under IDEA The ...

  18. NINJA-OPS: Fast Accurate Marker Gene Alignment Using Concatenated Ribosomes

    PubMed Central

    Al-Ghalith, Gabriel A.; Montassier, Emmanuel; Ward, Henry N.; Knights, Dan

    2016-01-01

    The explosion of bioinformatics technologies in the form of next generation sequencing (NGS) has facilitated a massive influx of genomics data in the form of short reads. Short read mapping is therefore a fundamental component of next generation sequencing pipelines which routinely match these short reads against reference genomes for contig assembly. However, such techniques have seldom been applied to microbial marker gene sequencing studies, which have mostly relied on novel heuristic approaches. We propose NINJA Is Not Just Another OTU-Picking Solution (NINJA-OPS, or NINJA for short), a fast and highly accurate novel method enabling reference-based marker gene matching (picking Operational Taxonomic Units, or OTUs). NINJA takes advantage of the Burrows-Wheeler (BW) alignment using an artificial reference chromosome composed of concatenated reference sequences, the “concatesome,” as the BW input. Other features include automatic support for paired-end reads with arbitrary insert sizes. NINJA is also free and open source and implements several pre-filtering methods that elicit substantial speedup when coupled with existing tools. We applied NINJA to several published microbiome studies, obtaining accuracy similar to or better than previous reference-based OTU-picking methods while achieving an order of magnitude or more speedup and using a fraction of the memory footprint. NINJA is a complete pipeline that takes a FASTA-formatted input file and outputs a QIIME-formatted taxonomy-annotated BIOM file for an entire MiSeq run of human gut microbiome 16S genes in under 10 minutes on a dual-core laptop. PMID:26820746

  19. Speech imagery recalibrates speech-perception boundaries.

    PubMed

    Scott, Mark

    2016-07-01

    The perceptual boundaries between speech sounds are malleable and can shift after repeated exposure to contextual information. This shift is known as recalibration. To date, the known inducers of recalibration are lexical (including phonotactic) information, lip-read information and reading. The experiments reported here are a proof-of-effect demonstration that speech imagery can also induce recalibration.

  20. Performance Analysis of the Link-16/JTIDS Waveform with Concatenated Coding, Soft Decision Reed-Solomon Decoding, and Noise-Normalization

    DTIC Science & Technology

    2010-09-01

    OF THE LINK-16/ JTIDS WAVEFORM WITH CONCATENATED CODING, SOFT DECISION REED-SOLOMON DECODING AND NOISE- NORMALIZATION by Katsaros Charalampos...of the Link-16/ JTIDS Waveform with Concatenated Coding, Soft Decision Reed-Solomon Decoding, and Noise-Normalization 6. AUTHOR(S) Charalampos...distribution is unlimited 12b. DISTRIBUTION CODE 13. ABSTRACT (maximum 200 words) The Joint Tactical Information Distribution System ( JTIDS ) is a

  1. Free Speech Yearbook 1978.

    ERIC Educational Resources Information Center

    Phifer, Gregg, Ed.

    The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…

  2. Talking Speech Input.

    ERIC Educational Resources Information Center

    Berliss-Vincent, Jane; Whitford, Gigi

    2002-01-01

    This article presents both the factors involved in successful speech input use and the potential barriers that may suggest that other access technologies could be more appropriate for a given individual. Speech input options that are available are reviewed and strategies for optimizing use of speech recognition technology are discussed. (Contains…

  3. Free Speech Yearbook: 1970.

    ERIC Educational Resources Information Center

    Tedford, Thomas L., Ed.

    This book is a collection of syllabi, attitude surveys, and essays relating to free-speech issues, compiled by the Committee on Freedom of Seech of the Speech Communication Association. The collection begins with a rationale for the inclusion of a course on free speech in the college curriculum. Three syllabi with bibliographies present guides for…

  4. Speech and respiration.

    PubMed

    Conrad, B; Schönle, P

    1979-04-12

    This investigation deals with the temporal aspects of air volume changes during speech. Speech respiration differs fundamentally from resting respiration. In resting respiration the duration and velocity of inspiration (air flow or lung volume change) are in a range similar to that of expiration. In speech respiration the duration of inspiration decreases and its velocity increases; conversely, the duration of expiration increases and the volume of air flow decreases dramatically. The following questions arise: are these two respiration types different entities, or do they represent the end points of a continuum from resting to speech respiration? How does articulation without the generation of speech sound affect breathing? Does (verbalized?) thinking without articulation or speech modify the breathing pattern? The main test battery included four tasks (spontaneous speech, reading, serial speech, arithmetic) performed under three conditions (speaking aloud, articulating subvocally, quiet performance by tryping to exclusively 'think' the tasks). Respiratory movements were measured with a chest pneumograph and evaluated in comparison with a phonogram and the identified spoken text. For quiet performance the resulting respiratory time ratio (relation of duration of inspiration versus expiration) showed a gradual shift in the direction of speech respiration--the least for reading, the most for arithmetic. This change was even more apparent for the subvocal tasks. It is concluded that (a) there is a gradual automatic change from resting to speech respiration and (b) the degree of internal verbalization (activation of motor speech areas) defines the degree of activation of the speech respiratory pattern.

  5. Comparison of Functional Connectivity Estimated from Concatenated Task-State Data from Block-Design Paradigm with That of Continuous Task

    PubMed Central

    Zhu, Yang; Cheng, Lin; He, Naying; Yang, Yang; Ling, Huawei; Tong, Shanbao

    2017-01-01

    Functional connectivity (FC) analysis with data collected as continuous tasks and activation analysis using data from block-design paradigms are two main methods to investigate the task-induced brain activation. If the concatenated data of task blocks extracted from the block-design paradigm could provide equivalent FC information to that derived from continuous task data, it would shorten the data collection time and simplify experimental procedures, and the already collected data of block-design paradigms could be reanalyzed from the perspective of FC. Despite being used in many studies, such a hypothesis of equivalence has not yet been tested from multiple perspectives. In this study, we collected fMRI blood-oxygen-level-dependent signals from 24 healthy subjects during a continuous task session as well as in block-design task sessions. We compared concatenated task blocks and continuous task data in terms of region of interest- (ROI-) based FC, seed-based FC, and brain network topology during a short motor task. According to our results, the concatenated data was not significantly different from the continuous data in multiple aspects, indicating the potential of using concatenated data to estimate task-state FC in short motor tasks. However, even under appropriate experimental conditions, the interpretation of FC results based on concatenated data should be cautious and take the influence due to inherent information loss during concatenation into account. PMID:28191030

  6. Comparison of Functional Connectivity Estimated from Concatenated Task-State Data from Block-Design Paradigm with That of Continuous Task.

    PubMed

    Zhu, Yang; Cheng, Lin; He, Naying; Yang, Yang; Ling, Huawei; Ayaz, Hasan; Tong, Shanbao; Sun, Junfeng; Fu, Yi

    2017-01-01

    Functional connectivity (FC) analysis with data collected as continuous tasks and activation analysis using data from block-design paradigms are two main methods to investigate the task-induced brain activation. If the concatenated data of task blocks extracted from the block-design paradigm could provide equivalent FC information to that derived from continuous task data, it would shorten the data collection time and simplify experimental procedures, and the already collected data of block-design paradigms could be reanalyzed from the perspective of FC. Despite being used in many studies, such a hypothesis of equivalence has not yet been tested from multiple perspectives. In this study, we collected fMRI blood-oxygen-level-dependent signals from 24 healthy subjects during a continuous task session as well as in block-design task sessions. We compared concatenated task blocks and continuous task data in terms of region of interest- (ROI-) based FC, seed-based FC, and brain network topology during a short motor task. According to our results, the concatenated data was not significantly different from the continuous data in multiple aspects, indicating the potential of using concatenated data to estimate task-state FC in short motor tasks. However, even under appropriate experimental conditions, the interpretation of FC results based on concatenated data should be cautious and take the influence due to inherent information loss during concatenation into account.

  7. Status Report on Speech Research: A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications, April 1-June 30, 1977.

    ERIC Educational Resources Information Center

    Haskins Labs., New Haven, CT.

    This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. The ten papers treat the following topics: speech synthesis as a tool for the study of speech production; the study of articulatory organization; phonetic perception; cardiac…

  8. A Statistical Quality Model for Data-Driven Speech Animation.

    PubMed

    Ma, Xiaohan; Deng, Zhigang

    2012-11-01

    In recent years, data-driven speech animation approaches have achieved significant successes in terms of animation quality. However, how to automatically evaluate the realism of novel synthesized speech animations has been an important yet unsolved research problem. In this paper, we propose a novel statistical model (called SAQP) to automatically predict the quality of on-the-fly synthesized speech animations by various data-driven techniques. Its essential idea is to construct a phoneme-based, Speech Animation Trajectory Fitting (SATF) metric to describe speech animation synthesis errors and then build a statistical regression model to learn the association between the obtained SATF metric and the objective speech animation synthesis quality. Through delicately designed user studies, we evaluate the effectiveness and robustness of the proposed SAQP model. To the best of our knowledge, this work is the first-of-its-kind, quantitative quality model for data-driven speech animation. We believe it is the important first step to remove a critical technical barrier for applying data-driven speech animation techniques to numerous online or interactive talking avatar applications.

  9. Tone recognition in continuous Cantonese speech using supratone models.

    PubMed

    Qian, Yao; Lee, Tan; Soong, Frank K

    2007-05-01

    This paper studies automatic tone recognition in continuous Cantonese speech. Cantonese is a major Chinese dialect that is known for being rich in tones. Tone information serves as a useful knowledge source for automatic speech recognition of Cantonese. Cantonese tone recognition is difficult because the tones have similar shapes of pitch contours. The tones are differentiated mainly by their relative pitch heights. In natural speech, the pitch level of a tone may shift up and down and the F0 ranges of different tones overlap with each other, making them acoustically indistinguishable within the domain of a syllable. Our study shows that the relative pitch heights are largely preserved between neighboring tones. A novel method of supratone modeling is proposed for Cantonese tone recognition. Each supratone model characterizes the F0 contour of two or three tones in succession. The tone sequence of a continuous utterance is formed as an overlapped concatenation of supratone units. The most likely tone sequence is determined under phonological constraints on syllable-tone combinations. The proposed method attains an accuracy of 74.68% in speaker-independent tone recognition experiments. In particular, the confusion among the tones with similar contour shapes is greatly resolved.

  10. Auditory cortical deactivation during speech production and following speech perception: an EEG investigation of the temporal dynamics of the auditory alpha rhythm

    PubMed Central

    Jenson, David; Harkrider, Ashley W.; Thornton, David; Bowers, Andrew L.; Saltuklaroglu, Tim

    2015-01-01

    Sensorimotor integration (SMI) across the dorsal stream enables online monitoring of speech. Jenson et al. (2014) used independent component analysis (ICA) and event related spectral perturbation (ERSP) analysis of electroencephalography (EEG) data to describe anterior sensorimotor (e.g., premotor cortex, PMC) activity during speech perception and production. The purpose of the current study was to identify and temporally map neural activity from posterior (i.e., auditory) regions of the dorsal stream in the same tasks. Perception tasks required “active” discrimination of syllable pairs (/ba/ and /da/) in quiet and noisy conditions. Production conditions required overt production of syllable pairs and nouns. ICA performed on concatenated raw 68 channel EEG data from all tasks identified bilateral “auditory” alpha (α) components in 15 of 29 participants localized to pSTG (left) and pMTG (right). ERSP analyses were performed to reveal fluctuations in the spectral power of the α rhythm clusters across time. Production conditions were characterized by significant α event related synchronization (ERS; pFDR < 0.05) concurrent with EMG activity from speech production, consistent with speech-induced auditory inhibition. Discrimination conditions were also characterized by α ERS following stimulus offset. Auditory α ERS in all conditions temporally aligned with PMC activity reported in Jenson et al. (2014). These findings are indicative of speech-induced suppression of auditory regions, possibly via efference copy. The presence of the same pattern following stimulus offset in discrimination conditions suggests that sensorimotor contributions following speech perception reflect covert replay, and that covert replay provides one source of the motor activity previously observed in some speech perception tasks. To our knowledge, this is the first time that inhibition of auditory regions by speech has been observed in real-time with the ICA/ERSP technique. PMID

  11. Auditory cortical deactivation during speech production and following speech perception: an EEG investigation of the temporal dynamics of the auditory alpha rhythm.

    PubMed

    Jenson, David; Harkrider, Ashley W; Thornton, David; Bowers, Andrew L; Saltuklaroglu, Tim

    2015-01-01

    Sensorimotor integration (SMI) across the dorsal stream enables online monitoring of speech. Jenson et al. (2014) used independent component analysis (ICA) and event related spectral perturbation (ERSP) analysis of electroencephalography (EEG) data to describe anterior sensorimotor (e.g., premotor cortex, PMC) activity during speech perception and production. The purpose of the current study was to identify and temporally map neural activity from posterior (i.e., auditory) regions of the dorsal stream in the same tasks. Perception tasks required "active" discrimination of syllable pairs (/ba/ and /da/) in quiet and noisy conditions. Production conditions required overt production of syllable pairs and nouns. ICA performed on concatenated raw 68 channel EEG data from all tasks identified bilateral "auditory" alpha (α) components in 15 of 29 participants localized to pSTG (left) and pMTG (right). ERSP analyses were performed to reveal fluctuations in the spectral power of the α rhythm clusters across time. Production conditions were characterized by significant α event related synchronization (ERS; pFDR < 0.05) concurrent with EMG activity from speech production, consistent with speech-induced auditory inhibition. Discrimination conditions were also characterized by α ERS following stimulus offset. Auditory α ERS in all conditions temporally aligned with PMC activity reported in Jenson et al. (2014). These findings are indicative of speech-induced suppression of auditory regions, possibly via efference copy. The presence of the same pattern following stimulus offset in discrimination conditions suggests that sensorimotor contributions following speech perception reflect covert replay, and that covert replay provides one source of the motor activity previously observed in some speech perception tasks. To our knowledge, this is the first time that inhibition of auditory regions by speech has been observed in real-time with the ICA/ERSP technique.

  12. RibAlign: a software tool and database for eubacterial phylogeny based on concatenated ribosomal protein subunits

    PubMed Central

    Teeling, Hanno; Gloeckner, Frank Oliver

    2006-01-01

    Background Until today, analysis of 16S ribosomal RNA (rRNA) sequences has been the de-facto gold standard for the assessment of phylogenetic relationships among prokaryotes. However, the branching order of the individual phlya is not well-resolved in 16S rRNA-based trees. In search of an improvement, new phylogenetic methods have been developed alongside with the growing availability of complete genome sequences. Unfortunately, only a few genes in prokaryotic genomes qualify as universal phylogenetic markers and almost all of them have a lower information content than the 16S rRNA gene. Therefore, emphasis has been placed on methods that are based on multiple genes or even entire genomes. The concatenation of ribosomal protein sequences is one method which has been ascribed an improved resolution. Since there is neither a comprehensive database for ribosomal protein sequences nor a tool that assists in sequence retrieval and generation of respective input files for phylogenetic reconstruction programs, RibAlign has been developed to fill this gap. Results RibAlign serves two purposes: First, it provides a fast and scalable database that has been specifically adapted to eubacterial ribosomal protein sequences and second, it provides sophisticated import and export capabilities. This includes semi-automatic extraction of ribosomal protein sequences from whole-genome GenBank and FASTA files as well as exporting aligned, concatenated and filtered sequence files that can directly be used in conjunction with the PHYLIP and MrBayes phylogenetic reconstruction programs. Conclusion Up to now, phylogeny based on concatenated ribosomal protein sequences is hampered by the limited set of sequenced genomes and high computational requirements. However, hundreds of full and draft genome sequencing projects are on the way, and advances in cluster-computing and algorithms make phylogenetic reconstructions feasible even with large alignments of concatenated marker genes. Rib

  13. Speech Recognition: A General Overview.

    ERIC Educational Resources Information Center

    de Sopena, Luis

    Speech recognition is one of five main areas in the field of speech processing. Difficulties in speech recognition include variability in sound within and across speakers, in channel, in background noise, and of speech production. Speech recognition can be used in a variety of situations: to perform query operations and phone call transfers; for…

  14. Early recognition of speech

    PubMed Central

    Remez, Robert E; Thomas, Emily F

    2013-01-01

    Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454

  15. Military applications of automatic speech recognition and future requirements

    NASA Technical Reports Server (NTRS)

    Beek, Bruno; Cupples, Edward J.

    1977-01-01

    An updated summary of the state-of-the-art of automatic speech recognition and its relevance to military applications is provided. A number of potential systems for military applications are under development. These include: (1) digital narrowband communication systems; (2) automatic speech verification; (3) on-line cartographic processing unit; (4) word recognition for militarized tactical data system; and (5) voice recognition and synthesis for aircraft cockpit.

  16. Neural network based speech synthesizer: A preliminary report

    NASA Technical Reports Server (NTRS)

    Villarreal, James A.; Mcintire, Gary

    1987-01-01

    A neural net based speech synthesis project is discussed. The novelty is that the reproduced speech was extracted from actual voice recordings. In essence, the neural network learns the timing, pitch fluctuations, connectivity between individual sounds, and speaking habits unique to that individual person. The parallel distributed processing network used for this project is the generalized backward propagation network which has been modified to also learn sequences of actions or states given in a particular plan.

  17. Speech Alarms Pilot Study

    NASA Technical Reports Server (NTRS)

    Sandor, Aniko; Moses, Haifa

    2016-01-01

    Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.

  18. Speech Sound Disorders: Articulation and Phonological Processes

    MedlinePlus

    ... Speech, Language and Swallowing / Disorders and Diseases Speech Sound Disorders: Articulation and Phonological Processes What are speech ... individuals with speech sound disorders ? What are speech sound disorders? Most children make some mistakes as they ...

  19. Chief Seattle's Speech Revisited

    ERIC Educational Resources Information Center

    Krupat, Arnold

    2011-01-01

    Indian orators have been saying good-bye for more than three hundred years. John Eliot's "Dying Speeches of Several Indians" (1685), as David Murray notes, inaugurates a long textual history in which "Indians... are most useful dying," or, as in a number of speeches, bidding the world farewell as they embrace an undesired but…

  20. Improving Alaryngeal Speech Intelligibility.

    ERIC Educational Resources Information Center

    Christensen, John M.; Dwyer, Patricia E.

    1990-01-01

    Laryngectomized patients using esophageal speech or an electronic artificial larynx have difficulty producing correct voicing contrasts between homorganic consonants. This paper describes a therapy technique that emphasizes "pushing harder" on voiceless consonants to improve alaryngeal speech intelligibility and proposes focusing on the…

  1. Illustrated Speech Anatomy.

    ERIC Educational Resources Information Center

    Shearer, William M.

    Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…

  2. Private Speech in Ballet

    ERIC Educational Resources Information Center

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  3. Advertising and Free Speech.

    ERIC Educational Resources Information Center

    Hyman, Allen, Ed.; Johnson, M. Bruce, Ed.

    The articles collected in this book originated at a conference at which legal and economic scholars discussed the issue of First Amendment protection for commercial speech. The first article, in arguing for freedom for commercial speech, finds inconsistent and untenable the arguments of those who advocate freedom from regulation for political…

  4. Tracking Speech Sound Acquisition

    ERIC Educational Resources Information Center

    Powell, Thomas W.

    2011-01-01

    This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. "Journal of Speech and Hearing Research", 33, 28-37), uses a railway idiom to track gains in…

  5. Free Speech Yearbook 1981.

    ERIC Educational Resources Information Center

    Kane, Peter E., Ed.

    1982-01-01

    The nine articles in this collection deal with theoretical and practical freedom of speech issues. Topics discussed include the following: (1) freedom of expression in Thailand and India; (2) metaphors and analogues in several landmark free speech cases; (3) Supreme Court Justice William O. Douglas's views of the First Amendment; (4) the San…

  6. Free Speech Yearbook 1975.

    ERIC Educational Resources Information Center

    Barbour, Alton, Ed.

    This issue of the "Free Speech Yearbook" contains the following: "Between Rhetoric and Disloyalty: Free Speech Standards for the Sunshire Soldier" by Richard A. Parker; "William A. Rehnquist: Ideologist on the Bench" by Peter E. Kane; "The First Amendment's Weakest Link: Government Regulation of Controversial…

  7. Egocentric Speech Reconsidered.

    ERIC Educational Resources Information Center

    Braunwald, Susan R.

    A range of language use model is proposed as an alternative conceptual framework to a stage model of egocentric speech. The range of language use model is proposed to clarify the meaning of the term egocentric speech, to examine the validity of stage assumptions, and to explain the existence of contextual variation in the form of children's…

  8. Free Speech Yearbook 1976.

    ERIC Educational Resources Information Center

    Phifer, Gregg, Ed.

    The articles collected in this annual address several aspects of First Amendment Law. The following titles are included: "Freedom of Speech As an Academic Discipline" (Franklyn S. Haiman), "Free Speech and Foreign-Policy Decision Making" (Douglas N. Freeman), "The Supreme Court and the First Amendment: 1975-1976"…

  9. SPEECH COMMUNICATION RESEARCH.

    DTIC Science & Technology

    studies of the dynamics of speech production through cineradiographic techniques and through acoustic analysis of formant motions in vowels in various...particular, the activity of the vocal cords and the dynamics of tongue motion. Research on speech perception has included experiments on vowel

  10. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  11. Commercial applications of speech interface technology: an industry at the threshold.

    PubMed

    Oberteuffer, J A

    1995-10-24

    Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.

  12. Commercial applications of speech interface technology: an industry at the threshold.

    PubMed Central

    Oberteuffer, J A

    1995-01-01

    Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines. PMID:7479717

  13. Research on Speech Perception. Progress Report No. 9, January 1983-December 1983.

    ERIC Educational Resources Information Center

    Pisoni, David B.; And Others

    Summarizing research activities from January 1983 to December 1983, this is the ninth annual report of research on speech perception, analysis and synthesis conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, and progress reports. The report…

  14. Research on Speech Perception. Progress Report No. 8, January 1982-December 1982.

    ERIC Educational Resources Information Center

    Pisoni, David B.; And Others

    Summarizing research activities from January 1982 to December 1982, this is the eighth annual report of research on speech perception, analysis and synthesis conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, progress reports, and information…

  15. Annealed lattice animal model and Flory theory for the melt of non-concatenated rings: towards the physics of crumpling.

    PubMed

    Grosberg, Alexander Y

    2014-01-28

    A Flory theory is constructed for a long polymer ring in a melt of unknotted and non-concatenated rings. The theory assumes that the ring forms an effective annealed branched object and computes its primitive path. It is shown that the primitive path follows self-avoiding statistics and is characterized by the corresponding Flory exponent of a polymer with excluded volume. Based on that, it is shown that rings in the melt are compact objects with overall size proportional to their length raised to the 1/3 power. Furthermore, the contact probability exponent γcontact is estimated, albeit by a poorly controlled approximation, with the result close to 1.1 consistent with both numerical and experimental data.

  16. Sperry Univac speech communications technology

    NASA Technical Reports Server (NTRS)

    Medress, Mark F.

    1977-01-01

    Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.

  17. Automatic Recognition of Deaf Speech.

    ERIC Educational Resources Information Center

    Abdelhamied, Kadry; And Others

    1990-01-01

    This paper describes a speech perception system for automatic recognition of deaf speech. Using a 2-step segmentation approach for 468 utterances by 2 hearing-impaired men and 2 normal-hearing men, rates as high as 93.01 percent and 81.81 percent recognition were obtained in recognizing from deaf speech isolated words and connected speech,…

  18. Voice and Speech after Laryngectomy

    ERIC Educational Resources Information Center

    Stajner-Katusic, Smiljka; Horga, Damir; Musura, Maja; Globlek, Dubravka

    2006-01-01

    The aim of the investigation is to compare voice and speech quality in alaryngeal patients using esophageal speech (ESOP, eight subjects), electroacoustical speech aid (EACA, six subjects) and tracheoesophageal voice prosthesis (TEVP, three subjects). The subjects reading a short story were recorded in the sound-proof booth and the speech samples…

  19. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-15

    ... Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With... this document, the Commission amends telecommunications relay services (TRS) mandatory minimum standards applicable to Speech- to-Speech (STS) relay service. This action is necessary to ensure...

  20. Fluid Dynamics of Human Phonation and Speech

    NASA Astrophysics Data System (ADS)

    Mittal, Rajat; Erath, Byron D.; Plesniak, Michael W.

    2013-01-01

    This article presents a review of the fluid dynamics, flow-structure interactions, and acoustics associated with human phonation and speech. Our voice is produced through the process of phonation in the larynx, and an improved understanding of the underlying physics of this process is essential to advancing the treatment of voice disorders. Insights into the physics of phonation and speech can also contribute to improved vocal training and the development of new speech compression and synthesis schemes. This article introduces the key biomechanical features of the laryngeal physiology, reviews the basic principles of voice production, and summarizes the progress made over the past half-century in understanding the flow physics of phonation and speech. Laryngeal pathologies, which significantly enhance the complexity of phonatory dynamics, are discussed. After a thorough examination of the state of the art in computational modeling and experimental investigations of phonatory biomechanics, we present a synopsis of the pacing issues in this arena and an outlook for research in this fascinating subject.

  1. Speech impairment (adult)

    MedlinePlus

    ... this page: //medlineplus.gov/ency/article/003204.htm Speech impairment (adult) To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  2. Speech disorders - children

    MedlinePlus

    ... this page: //medlineplus.gov/ency/article/001430.htm Speech disorders - children To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  3. Speech perception as categorization

    PubMed Central

    Holt, Lori L.; Lotto, Andrew J.

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition. PMID:20601702

  4. Anxiety and ritualized speech

    ERIC Educational Resources Information Center

    Lalljee, Mansur; Cook, Mark

    1975-01-01

    The experiment examines the effects on a number of words that seem irrelevant to semantic communication. The Units of Ritualized Speech (URSs) considered are: 'I mean', 'in fact', 'really', 'sort of', 'well' and 'you know'. (Editor)

  5. Speech and Communication Disorders

    MedlinePlus

    ... or understand speech. Causes include Hearing disorders and deafness Voice problems, such as dysphonia or those caused ... language therapy can help. NIH: National Institute on Deafness and Other Communication Disorders

  6. Thai Automatic Speech Recognition

    DTIC Science & Technology

    2005-01-01

    reported elsewhere. 1. Introduction This research was performed as part of the DARPA-Babylon program aimed at rapidly developing multilingual speech-to...used in an external DARPA evaluation involving medical scenarios between an American Doctor and a naïve monolingual Thai patient. 2. Thai Language...To create more general acoustic models we collected read speech data from native speakers based on the concepts of our multilingual data collection

  7. Human speech articulator measurements using low power, 2GHz Homodyne sensors

    SciTech Connect

    Barnes, T; Burnett, G C; Holzrichter, J F

    1999-06-29

    Very low power, short-range microwave ''radar-like'' sensors can measure the motions and vibrations of internal human speech articulators as speech is produced. In these animate (and also in inanimate acoustic systems) microwave sensors can measure vibration information associated with excitation sources and other interfaces. These data, together with the corresponding acoustic data, enable the calculation of system transfer functions. This information appears to be useful for a surprisingly wide range of applications such as speech coding and recognition, speaker or object identification, speech and musical instrument synthesis, noise cancellation, and other applications.

  8. Musician advantage for speech-on-speech perception.

    PubMed

    Başkent, Deniz; Gaudrain, Etienne

    2016-03-01

    Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level auditory cognitive functions, such as attention. Indeed, despite the few non-musicians who performed as well as musicians, on a group level, there was a strong musician benefit for speech perception in a speech masker. This benefit does not seem to result from better voice processing and could instead be related to better stream segregation or enhanced cognitive functions.

  9. Online Searching Using Speech as a Man/Machine Interface.

    ERIC Educational Resources Information Center

    Peters, B. F.; And Others

    1989-01-01

    Describes the development, implementation, and evaluation of a voice interface for the British Library Blaise Online Information Retrieval System. Results of the evaluation show that the use of currently available speech recognition and synthesis hardware, along with intelligent software, can provide an interface well suited to the needs of online…

  10. Robust Speech Rate Estimation for Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth S.

    2010-01-01

    In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral subband correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database. PMID:20428476

  11. Speech Alarms Pilot Study

    NASA Technical Reports Server (NTRS)

    Sandor, A.; Moses, H. R.

    2016-01-01

    Currently on the International Space Station (ISS) and other space vehicles Caution & Warning (C&W) alerts are represented with various auditory tones that correspond to the type of event. This system relies on the crew's ability to remember what each tone represents in a high stress, high workload environment when responding to the alert. Furthermore, crew receive a year or more in advance of the mission that makes remembering the semantic meaning of the alerts more difficult. The current system works for missions conducted close to Earth where ground operators can assist as needed. On long duration missions, however, they will need to work off-nominal events autonomously. There is evidence that speech alarms may be easier and faster to recognize, especially during an off-nominal event. The Information Presentation Directed Research Project (FY07-FY09) funded by the Human Research Program included several studies investigating C&W alerts. The studies evaluated tone alerts currently in use with NASA flight deck displays along with candidate speech alerts. A follow-on study used four types of speech alerts to investigate how quickly various types of auditory alerts with and without a speech component - either at the beginning or at the end of the tone - can be identified. Even though crew were familiar with the tone alert from training or direct mission experience, alerts starting with a speech component were identified faster than alerts starting with a tone. The current study replicated the results from the previous study in a more rigorous experimental design to determine if the candidate speech alarms are ready for transition to operations or if more research is needed. Four types of alarms (caution, warning, fire, and depressurization) were presented to participants in both tone and speech formats in laboratory settings and later in the Human Exploration Research Analog (HERA). In the laboratory study, the alerts were presented by software and participants were

  12. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    ERIC Educational Resources Information Center

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  13. Why Go to Speech Therapy?

    MedlinePlus

    ... Teachers Speech-Language Pathologists Physicians Employers Tweet Why Go To Speech Therapy? Parents of Preschoolers Parents of ... types of therapy work best when you can go on an intensive schedule (i.e., every day ...

  14. Hearing or speech impairment - resources

    MedlinePlus

    Resources - hearing or speech impairment ... The following organizations are good resources for information on hearing impairment or speech impairment: Alexander Graham Bell Association for the Deaf and Hard of Hearing -- www.agbell. ...

  15. Development of a speech autocuer

    NASA Technical Reports Server (NTRS)

    Bedles, R. L.; Kizakvich, P. N.; Lawson, D. T.; Mccartney, M. L.

    1980-01-01

    A wearable, visually based prosthesis for the deaf based upon the proven method for removing lipreading ambiguity known as cued speech was fabricated and tested. Both software and hardware developments are described, including a microcomputer, display, and speech preprocessor.

  16. Multilingual Speech and Language Processing

    DTIC Science & Technology

    2003-04-01

    FRANCE RTO MEETING PROCEEDINGS 66 Multilingual Speech and Language Processing (Le traitement multilingue de la parole et du langage) Papers presented at... Multilingual Speech and Language Processing (Le traitement multilingue de la parole et du langage) Papers presented at the Information Systems Technology Panel...Reserved ISBN 92-837-1102-5 iii Multilingual Speech and Language Processing (RTO MP-066 / IST-025) Executive Summary Multilingual speech and language

  17. Abortion and compelled physician speech.

    PubMed

    Orentlicher, David

    2015-01-01

    Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading.

  18. "Zero Tolerance" for Free Speech.

    ERIC Educational Resources Information Center

    Hils, Lynda

    2001-01-01

    Argues that school policies of "zero tolerance" of threatening speech may violate a student's First Amendment right to freedom of expression if speech is less than a "true threat." Suggests a two-step analysis to determine if student speech is a "true threat." (PKP)

  19. Speech Cues and Sign Stimuli.

    ERIC Educational Resources Information Center

    Mattingly, Ignatius G.

    Parallels between sign stimuli and speech cues suggest some interesting speculations about the origins of language. Speech cues may belong to the class of human sign stimuli which, as in animal behavior, may be the product of an innate releasing mechanism. Prelinguistic speech for man may have functioned as a social-releaser system. Human language…

  20. Signed Soliloquy: Visible Private Speech

    ERIC Educational Resources Information Center

    Zimmermann, Kathrin; Brugger, Peter

    2013-01-01

    Talking to oneself can be silent (inner speech) or vocalized for others to hear (private speech, or soliloquy). We investigated these two types of self-communication in 28 deaf signers and 28 hearing adults. With a questionnaire specifically developed for this study, we established the visible analog of vocalized private speech in deaf signers.…

  1. Study of acoustic correlates associate with emotional speech

    NASA Astrophysics Data System (ADS)

    Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

    2004-10-01

    This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

  2. Performance Analysis of a JTIDS/Link-16 Type Waveform using 32-ary Orthogonal Signaling with 32 Chip Baseband Waveforms and a Concatenated Code

    DTIC Science & Technology

    2009-12-01

    ANALYSIS OF A JTIDS /LINK-16 TYPE WAVEFORM USING 32-ARY ORTHOGONAL SIGNALING WITH 32 CHIP BASEBAND WAVEFORMS AND A CONCATENATED CODE by Theodoros...REPORT DATE December 2009 3. REPORT TYPE AND DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE: Performance Analysis of a JTIDS /Link-16-type...A 13. ABSTRACT (maximum 200 words) The Joint Tactical Information Distribution System ( JTIDS ) is a hybrid frequency-hopped, direct

  3. Speech transmission index from running speech: A neural network approach

    NASA Astrophysics Data System (ADS)

    Li, F. F.; Cox, T. J.

    2003-04-01

    Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

  4. Black History Speech

    ERIC Educational Resources Information Center

    Noldon, Carl

    2007-01-01

    The author argues in this speech that one cannot expect students in the school system to know and understand the genius of Black history if the curriculum is Eurocentric, which is a residue of racism. He states that his comments are designed for the enlightenment of those who suffer from a school system that "hypocritically manipulates Black…

  5. Forensics and Speech Communication

    ERIC Educational Resources Information Center

    McBath, James H.

    1975-01-01

    Focuses on the importance of integrating forensics programs into the speech communication curriculum. Maintains that debating and argumentation skills increase the probability of academic success. Published by the Association for Communication Administration Bulletin, Staff Coordinator, ACA 5205 Leesburg Pike, Falls Church, VA 22041, $25.00 annual…

  6. Mandarin Visual Speech Information

    ERIC Educational Resources Information Center

    Chen, Trevor H.

    2010-01-01

    While the auditory-only aspects of Mandarin speech are heavily-researched and well-known in the field, this dissertation addresses its lesser-known aspects: The visual and audio-visual perception of Mandarin segmental information and lexical-tone information. Chapter II of this dissertation focuses on the audiovisual perception of Mandarin…

  7. Speech intelligibility in hospitals.

    PubMed

    Ryherd, Erica E; Moeller, Michael; Hsu, Timothy

    2013-07-01

    Effective communication between staff members is key to patient safety in hospitals. A variety of patient care activities including admittance, evaluation, and treatment rely on oral communication. Surprisingly, published information on speech intelligibility in hospitals is extremely limited. In this study, speech intelligibility measurements and occupant evaluations were conducted in 20 units of five different U.S. hospitals. A variety of unit types and locations were studied. Results show that overall, no unit had "good" intelligibility based on the speech intelligibility index (SII > 0.75) and several locations found to have "poor" intelligibility (SII < 0.45). Further, occupied spaces were found to have 10%-15% lower SII than unoccupied spaces on average. Additionally, staff perception of communication problems at nurse stations was significantly correlated with SII ratings. In a targeted second phase, a unit treated with sound absorption had higher SII ratings for a larger percentage of time as compared to an identical untreated unit. Taken as a whole, the study provides an extensive baseline evaluation of speech intelligibility across a variety of hospitals and unit types, offers some evidence of the positive impact of absorption on intelligibility, and identifies areas for future research.

  8. Free Speech Yearbook 1973.

    ERIC Educational Resources Information Center

    Barbour, Alton, Ed.

    The first article in this collection examines civil disobedience and the protections offered by the First Amendment. The second article discusses a study on antagonistic expressions in a free society. The third essay deals with attitudes toward free speech and treatment of the United States flag. There are two articles on media; the first examines…

  9. The Commercial Speech Doctrine.

    ERIC Educational Resources Information Center

    Luebke, Barbara F.

    In its 1942 ruling in the "Valentine vs. Christensen" case, the Supreme Court established the doctrine that commercial speech is not protected by the First Amendment. In 1975, in the "Bigelow vs. Virginia" case, the Supreme Court took a decisive step toward abrogating that doctrine, by ruling that advertising is not stripped of…

  10. Recognition of speech spectrograms.

    PubMed

    Greene, B G; Pisoni, D B; Carrell, T D

    1984-07-01

    The performance of eight naive observers in learning to identify speech spectrograms was studied over a 2-month period. Single tokens from a 50-word phonetically balanced (PB) list were recorded by several talkers and displayed on a Spectraphonics Speech Spectrographic Display system. Identification testing occurred immediately after daily training sessions. After approximately 20 h of training, naive subjects correctly identified the 50 PB words from a single talker over 95% of the time. Generalization tests with the same words were then carried out with different tokens from the original talker, new tokens from another male talker, a female talker, and finally, a synthetic talker. The generalization results for these talkers showed recognition performance at 91%, 76%, 76%, and 48%, respectively. Finally, generalization tests with a novel set of PB words produced by the original talker were also carried out to examine in detail the perceptual strategies and visual features that subjects abstracted from the training set. Our results demonstrate that even without formal training in phonetics or acoustics naive observers can learn to identify visual displays of speech at very high levels of accuracy. Analysis of subjects' performance in a verbal protocol task demonstrated that they rely on salient visual correlates of many phonetic features in speech.

  11. On Curbing Racial Speech.

    ERIC Educational Resources Information Center

    Gale, Mary Ellen

    1991-01-01

    An alternative interpretation of the First Amendment guarantee of free speech suggests that universities may prohibit and punish direct verbal assaults on specific individuals if the speaker intends to do harm and if a reasonable person would recognize the potential for serious interference with the victim's educational rights. (MSE)

  12. Speech Understanding Systems

    DTIC Science & Technology

    1975-03-01

    insensitive to random occurrences of noise. 3) It is capable of being extended to handle large vocabularies. 4) It oermits alternate...baseforms, phonological rules, and marking of syllable boundaries and stress levels from the Speech Communications Research Laboratory , We also

  13. Hearing speech in music.

    PubMed

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (P<.01). Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01) and SPN (P<.05). Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01), but there were smaller differences between masking conditions (P<.01). It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  14. Speech and Hearing Therapy.

    ERIC Educational Resources Information Center

    Sakata, Reiko; Sakata, Robert

    1978-01-01

    In the public school, the speech and hearing therapist attempts to foster child growth and development through the provision of services basic to awareness of self and others, management of personal and social interactions, and development of strategies for coping with the handicap. (MM)

  15. Perceptual Learning in Speech

    ERIC Educational Resources Information Center

    Norris, Dennis; McQueen, James M.; Cutler, Anne

    2003-01-01

    This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listeners heard ambiguous [f]-final words (e.g.,…

  16. Speech to schoolchildren

    NASA Astrophysics Data System (ADS)

    Angell, C. Austen

    2013-02-01

    Prof. C. A. Angell from Arizona State University read the following short and simple speech, saying the sentences in Italics in the best Japanese he could manage (after earnest coaching from a Japanese colleague). The rest was translated on the bus ride, and then spoken, as I spoke, by Ms. Yukako Endo- to whom the author is very grateful.

  17. Expectations and speech intelligibility.

    PubMed

    Babel, Molly; Russell, Jamie

    2015-05-01

    Socio-indexical cues and paralinguistic information are often beneficial to speech processing as this information assists listeners in parsing the speech stream. Associations that particular populations speak in a certain speech style can, however, make it such that socio-indexical cues have a cost. In this study, native speakers of Canadian English who identify as Chinese Canadian and White Canadian read sentences that were presented to listeners in noise. Half of the sentences were presented with a visual-prime in the form of a photo of the speaker and half were presented in control trials with fixation crosses. Sentences produced by Chinese Canadians showed an intelligibility cost in the face-prime condition, whereas sentences produced by White Canadians did not. In an accentedness rating task, listeners rated White Canadians as less accented in the face-prime trials, but Chinese Canadians showed no such change in perceived accentedness. These results suggest a misalignment between an expected and an observed speech signal for the face-prime trials, which indicates that social information about a speaker can trigger linguistic associations that come with processing benefits and costs.

  18. Media Criticism Group Speech

    ERIC Educational Resources Information Center

    Ramsey, E. Michele

    2004-01-01

    Objective: To integrate speaking practice with rhetorical theory. Type of speech: Persuasive. Point value: 100 points (i.e., 30 points based on peer evaluations, 30 points based on individual performance, 40 points based on the group presentation), which is 25% of course grade. Requirements: (a) References: 7-10; (b) Length: 20-30 minutes; (c)…

  19. Free Speech Yearbook, 1974.

    ERIC Educational Resources Information Center

    Barbour, Alton, Ed.

    A collection of essays on free speech and communication is contained in this book. The essays include "From Fairness to Access and Back Again: Some Dimensions of Free Expression in Broadcasting"; "Local Option on the First Amendment?"; "A Look at the Fire Symbol Before and After May 4, 1970"; "Freedom to Teach,…

  20. Free Speech Yearbook 1979.

    ERIC Educational Resources Information Center

    Kane, Peter E., Ed.

    The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…

  1. Speech Synthesis Using Perceptually Motivated Features

    DTIC Science & Technology

    2012-01-23

    rhythm as one of the basic sampling frequencies of consciousness that reflect data fetching and transmission of information throughout the brain, but...everything" in the same way that quantum mechanics (and more recently string theory) try to unify many different levels of the physical world. Within this

  2. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  3. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    ERIC Educational Resources Information Center

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  4. Seeking a reading machine for the blind and discovering the speech code.

    PubMed

    Shankweiler, Donald; Fowler, Carol A

    2015-02-01

    A machine that can read printed material to the blind became a priority at the end of World War II with the appointment of a U.S. Government committee to instigate research on sensory aids to improve the lot of blinded veterans. The committee chose Haskins Laboratories to lead a multisite research program. Initially, Haskins researchers overestimated the capacities of users to learn an acoustic code based on the letters of a text, resulting in unsuitable designs. Progress was slow because the researchers clung to a mistaken view that speech is a sound alphabet and because of persisting gaps in man-machine technology. The tortuous route to a practical reading machine transformed the scientific understanding of speech perception and reading at Haskins Labs and elsewhere, leading to novel lines of basic research and new technologies. Research at Haskins Laboratories made valuable contributions in clarifying the physical basis of speech. Researchers recognized that coarticulatory overlap eliminated the possibility of alphabet-like discrete acoustic segments in speech. This work advanced the study of speech perception and contributed to our understanding of the relation of speech perception to production. Basic findings on speech enabled the development of speech synthesis, part science and part technology, essential for development of a reading machine, which has found many applications. Findings on the nature of speech further stimulated a new understanding of word recognition in reading across languages and scripts and contributed to our understanding of reading development and reading disabilities.

  5. Toward the ultimate synthesis/recognition system.

    PubMed

    Furui, S

    1995-10-24

    This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.

  6. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants

    PubMed Central

    Chen, Ke Heng; Small, Susan A.

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability. PMID:26798343

  7. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants.

    PubMed

    Chen, Ke Heng; Small, Susan A

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability.

  8. A Java speech implementation of the Mini Mental Status Exam.

    PubMed Central

    Wang, S. S.; Starren, J.

    1999-01-01

    The Folstein Mini Mental Status Exam (MMSE) is a simple, widely used, verbally administered test to assess cognitive function. The Java Speech Application Programming Interface (JSAPI) is a new, cross-platform interface for both speech recognition and speech synthesis in the Java environment. To evaluate the suitability of the JSAPI for interactive, patient interview applications, a JSAPI implementation of the MMSE was developed. The MMSE contains questions that vary in structure in order to assess different cognitive functions. This question variability provided an excellent test-bed to evaluate the strengths and weaknesses of JSAPI. The application is based on Java platform 2 and a JSAPI interface to the IBM ViaVoice recognition engine. Design and implementations issues are discussed. Preliminary usability studies demonstrate that an automated MMSE maybe a useful screening tool for cognitive disorders and changes. Images Figure 2 Figure 3 Figure 4 PMID:10566396

  9. Influence of mothers' slower speech on their children's speech rate.

    PubMed

    Guitar, B; Marchinkoski, L

    2001-08-01

    This study investigated the effects on children's speech rate when their mothers talked more slowly. Six mothers and their normally speaking 3-year-olds (3 girls and 3 boys) were studied using single-subject A-B-A-B designs. Conversational speech rates of mothers were reduced by approximately half in the experimental (B) conditions. Five of the six children appeared to reduce their speech rates when their mothers spoke more slowly. This was confirmed by paired t tests (p < .05) that showed significant decreases in the 5 children's speech rate over the two B conditions. These findings suggest that when mothers substantially decrease their speech rates in a controlled situation, their children also decrease their speech rates. Clinical implications are discussed.

  10. TEACHER'S GUIDE TO HIGH SCHOOL SPEECH.

    ERIC Educational Resources Information Center

    JENKINSON, EDWARD B., ED.

    THIS GUIDE TO HIGH SCHOOL SPEECH FOCUSES ON SPEECH AS ORAL COMPOSITION, STRESSING THE IMPORTANCE OF CLEAR THINKING AND COMMUNICATION. THE PROPOSED 1-SEMESTER BASIC COURSE IN SPEECH ATTEMPTS TO IMPROVE THE STUDENT'S ABILITY TO COMPOSE AND DELIVER SPEECHES, TO THINK AND LISTEN CRITICALLY, AND TO UNDERSTAND THE SOCIAL FUNCTION OF SPEECH. IN ADDITION…

  11. The feasibility of miniaturizing the versatile portable speech prosthesis: A market survey of commercial products

    NASA Technical Reports Server (NTRS)

    Walklet, T.

    1981-01-01

    The feasibility of a miniature versatile portable speech prosthesis (VPSP) was analyzed and information on its potential users and on other similar devices was collected. The VPSP is a device that incorporates speech synthesis technology. The objective is to provide sufficient information to decide whether there is valuable technology to contribute to the miniaturization of the VPSP. The needs of potential users are identified, the development status of technologies similar or related to those used in the VPSP are evaluated. The VPSP, a computer based speech synthesis system fits on a wheelchair. The purpose was to produce a device that provides communication assistance in educational, vocational, and social situations to speech impaired individuals. It is expected that the VPSP can be a valuable aid for persons who are also motor impaired, which explains the placement of the system on a wheelchair.

  12. Headphone localization of speech

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1993-01-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with nonindividualized HRTFs. About half of the subjects 'pulled' their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15 to 46 percent of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  13. Musical intervals in speech.

    PubMed

    Ross, Deborah; Choi, Jonathan; Purves, Dale

    2007-06-05

    Throughout history and across cultures, humans have created music using pitch intervals that divide octaves into the 12 tones of the chromatic scale. Why these specific intervals in music are preferred, however, is not known. In the present study, we analyzed a database of individually spoken English vowel phones to examine the hypothesis that musical intervals arise from the relationships of the formants in speech spectra that determine the perceptions of distinct vowels. Expressed as ratios, the frequency relationships of the first two formants in vowel phones represent all 12 intervals of the chromatic scale. Were the formants to fall outside the ranges found in the human voice, their relationships would generate either a less complete or a more dilute representation of these specific intervals. These results imply that human preference for the intervals of the chromatic scale arises from experience with the way speech formants modulate laryngeal harmonics to create different phonemes.

  14. Headphone localization of speech.

    PubMed

    Begault, D R; Wenzel, E M

    1993-06-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with non-individualized HRTFs. About half of the subjects "pulled" their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15% to 46% of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  15. Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces

    PubMed Central

    Bocquelet, Florent; Hueber, Thomas; Girin, Laurent; Savariaux, Christophe; Yvert, Blaise

    2016-01-01

    Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer. PMID:27880768

  16. Neurophysiology of Speech Differences in Childhood Apraxia of Speech

    PubMed Central

    Preston, Jonathan L.; Molfese, Peter J.; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes. PMID:25090016

  17. Neurophysiology of speech differences in childhood apraxia of speech.

    PubMed

    Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.

  18. Speech Understanding Research

    DTIC Science & Technology

    1976-10-01

    ELEMENT PARITY XII-10 C. THE ENVIRONMENT TREE XII-14 D. THE EXECUTIVE FOR THE DEDUCTIVE COMPONENT .... XII-18 E. GENERATING CANDIDATE BINDINGS FOR A...SELECTED QVISTA ELEMENT XII-22 F. RAMIFICATIONS OF A PROPOSED BINDING XII-23 G. THE BINDER XII-26 H. DERIVING ELEMENT-OF AND SUBSET RELATIONS...developed to resolve simple anaphoric reference and to correlate information from a primitive world model. Using programs for speech analysis and

  19. Speech Quality Measurement

    DTIC Science & Technology

    1977-06-10

    noise test , t=2 for t1-v low p’ass f lit er te st ,and t 3 * or theit ADP(NI cod ing tevst ’*s is the sub lec nube 0l e tet Bostz- Av L b U0...a 1ý...it aepa rate, speech clu.1 t laboratory and controlled by the NOVA 830 computoer . Bach of tho stations has a CRT, .15 response buttons, a "rad button

  20. Speech Database Development

    DTIC Science & Technology

    1988-11-21

    cluded basic phonetic coverage and varying phonetic environments. Examining pairs of phonemes we augmented these sentences, attempting to have at...and the sampling rate. Speech data was collected and recorded utilizing the Vocabulary Master Library file (VML). 630 VML files were created and run on...these descriptions is attached with this report as Appendix B.) Basically , phonetic alignment is accomplished in three steps. First, each 5 ms frame

  1. Trainable Videorealistic Speech Animation

    DTIC Science & Technology

    2006-01-01

    1993] [ LeGoff and Benoit 1996]. In physics-based methods, the animator relies on the laws of physics to determine the mouth movement, given some initial...work in the memory of Christian Benoit [ LeGoff and Benoit 1996] who was a pioneer in audiovisual speech research. The authors would like to thank...Polymorph: An algorithm for morphing among multiple images. IEEE Computer Graphics Applications 18, 58–71. LEGOFF , B., AND BENOIT, C. 1996. A text-to

  2. Hiding Information under Speech

    DTIC Science & Technology

    2005-12-12

    as it arrives in real time, and it disappears as fast as it arrives. Furthermore, our cognitive process for translating audio sounds to the meaning... steganography , whose goal is to make the embedded data completely undetectable. In addi- tion, we must dismiss the idea of hiding data by using any...therefore, an image has more room to hide data; and (2) speech steganography has not led to many money-making commercial businesses. For these two

  3. Two Methods of Mechanical Noise Reduction of Recorded Speech During Phonation in an MRI device

    NASA Astrophysics Data System (ADS)

    Přibil, J.; Horáček, J.; Horák, P.

    2011-01-01

    The paper presents two methods of noise reduction of speech signal recorded in an MRI device during phonation for the human vocal tract modelling. The applied approach of noise speech signal cleaning is based on cepstral speech analysis and synthesis because the noise is mainly produced by gradient coils, has a mechanical character, and can be processed in spectral domain. Our first noise reduction method is using real cepstrum limitation and clipping the "peaks" corresponding to the harmonic frequencies of mechanical noise. The second method is coming out from substation of the short-time spectra of two signals recorded withal: the first includes speech and noise, and the second consists of noise only. The resulting speech quality was compared by spectrogram and mean periodogram methods.

  4. Applications for Subvocal Speech

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Betts, Bradley

    2007-01-01

    A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

  5. Speech rhythm: a metaphor?

    PubMed

    Nolan, Francis; Jeon, Hae-Sung

    2014-12-19

    Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep 'prominence gradient', i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a 'stress-timed' language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow 'syntagmatic contrast' between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence of alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and it is this analogical process which allows speech to be matched to external rhythms.

  6. Speech rhythm: a metaphor?

    PubMed Central

    Nolan, Francis; Jeon, Hae-Sung

    2014-01-01

    Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep ‘prominence gradient’, i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a ‘stress-timed’ language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow ‘syntagmatic contrast’ between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence of alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and it is this analogical process which allows speech to be matched to external rhythms. PMID:25385774

  7. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    NASA Astrophysics Data System (ADS)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  8. Resolving ambiguity of species limits and concatenation in multilocus sequence data for the construction of phylogenetic supermatrices.

    PubMed

    Chesters, Douglas; Vogler, Alfried P

    2013-05-01

    Public DNA databases are becoming too large and too complex for manual methods to generate phylogenetic supermatrices from multiple gene sequences. Delineating the terminals based on taxonomic labels is no longer practical because species identifications are frequently incomplete and gene trees are incongruent with Linnaean binomials, which results in uncertainty about how to combine species units among unlinked loci. We developed a procedure that minimizes the problem of forming multilocus species units in a large phylogenetic data set using algorithms from graph theory. An initial step established sequence clusters for each locus that broadly correspond to the species level. These clusters frequently include sequences labeled with various binomials and specimen identifiers that create multiple alternatives for concatenation. To choose among these possibilities, we minimize taxonomic conflict among the species units globally in the data set using a multipartite heuristic algorithm. The procedure was applied to all available GenBank data for Coleoptera (beetles) including > 10 500 taxon labels and > 23 500 sequences of 4 loci, which were grouped into 11 241 clusters or divergent singletons by the BlastClust software. Within each cluster, unidentified sequences could be assigned to a species name through the association with fully identified sequences, resulting in 510 new identifications (13.9% of total unidentified sequences) of which nearly half were "trans-locus" identifications by clustering of sequences at a secondary locus. The limits of DNA-based clusters were inconsistent with the Linnaean binomials for 1518 clusters (13.5%) that contained more than one binomial or split a single binomial among multiple clusters. By applying a scoring scheme for full and partial name matches in pairs of clusters, a maximum weight set of 7366 global species units was produced. Varying the match weights for partial matches had little effect on the number of units, although if

  9. Evaluation of NASA speech encoder

    NASA Technical Reports Server (NTRS)

    1976-01-01

    Techniques developed by NASA for spaceflight instrumentation were used in the design of a quantizer for speech-decoding. Computer simulation of the actions of the quantizer was tested with synthesized and real speech signals. Results were evaluated by a phometician. Topics discussed include the relationship between the number of quantizer levels and the required sampling rate; reconstruction of signals; digital filtering; speech recording, sampling, and storage, and processing results.

  10. Speech perception at the interface of neurobiology and linguistics.

    PubMed

    Poeppel, David; Idsardi, William J; van Wassenhove, Virginie

    2008-03-12

    Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.

  11. Speech perception in noise with a harmonic complex excited vocoder.

    PubMed

    Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Ihlefeld, Antje; Litovsky, Ruth Y

    2014-04-01

    A cochlear implant (CI) presents band-pass-filtered acoustic envelope information by modulating current pulse train levels. Similarly, a vocoder presents envelope information by modulating an acoustic carrier. By studying how normal hearing (NH) listeners are able to understand degraded speech signals with a vocoder, the parameters that best simulate electric hearing and factors that might contribute to the NH-CI performance difference may be better understood. A vocoder with harmonic complex carriers (fundamental frequency, f0 = 100 Hz) was used to study the effect of carrier phase dispersion on speech envelopes and intelligibility. The starting phases of the harmonic components were randomly dispersed to varying degrees prior to carrier filtering and modulation. NH listeners were tested on recognition of a closed set of vocoded words in background noise. Two sets of synthesis filters simulated different amounts of current spread in CIs. Results showed that the speech vocoded with carriers whose starting phases were maximally dispersed was the most intelligible. Superior speech understanding may have been a result of the flattening of the dispersed-phase carrier's intrinsic temporal envelopes produced by the large number of interacting components in the high-frequency channels. Cross-correlogram analyses of auditory nerve model simulations confirmed that randomly dispersing the carrier's component starting phases resulted in better neural envelope representation. However, neural metrics extracted from these analyses were not found to accurately predict speech recognition scores for all vocoded speech conditions. It is possible that central speech understanding mechanisms are insensitive to the envelope-fine structure dichotomy exploited by vocoders.

  12. Feature extraction and models for speech: An overview

    NASA Astrophysics Data System (ADS)

    Schroeder, Manfred

    2002-11-01

    Modeling of speech has a long history, beginning with Count von Kempelens 1770 mechanical speaking machine. Even then human vowel production was seen as resulting from a source (the vocal chords) driving a physically separate resonator (the vocal tract). Homer Dudley's 1928 frequency-channel vocoder and many of its descendants are based on the same successful source-filter paradigm. For linguistic studies as well as practical applications in speech recognition, compression, and synthesis (see M. R. Schroeder, Computer Speech), the extant models require the (often difficult) extraction of numerous parameters such as the fundamental and formant frequencies and various linguistic distinctive features. Some of these difficulties were obviated by the introduction of linear predictive coding (LPC) in 1967 in which the filter part is an all-pole filter, reflecting the fact that for non-nasalized vowels the vocal tract is well approximated by an all-pole transfer function. In the now ubiquitous code-excited linear prediction (CELP), the source-part is replaced by a code book which (together with a perceptual error criterion) permits speech compression to very low bit rates at high speech quality for the Internet and cell phones.

  13. Speech and Language Problems in Children

    MedlinePlus

    Children vary in their development of speech and language skills. Health care professionals have lists of milestones ... it may be due to a speech or language disorder. Children who have speech disorders may have ...

  14. Speech systems research at Texas Instruments

    NASA Technical Reports Server (NTRS)

    Doddington, George R.

    1977-01-01

    An assessment of automatic speech processing technology is presented. Fundamental problems in the development and the deployment of automatic speech processing systems are defined and a technology forecast for speech systems is presented.

  15. Multilocus phylogeny of the lichen-forming fungal genus Melanohalea (Parmeliaceae, Ascomycota): insights on diversity, distributions, and a comparison of species tree and concatenated topologies.

    PubMed

    Leavitt, Steven D; Esslinger, Theodore L; Spribille, Toby; Divakar, Pradeep K; Thorsten Lumbsch, H

    2013-01-01

    Accurate species circumscriptions are central for many biological disciplines and have critical implications for ecological and conservation studies. An increasing body of evidence suggests that in some cases traditional morphology-based taxonomy have underestimated diversity in lichen-forming fungi. Therefore, genetic data play an increasing role for recognizing distinct lineages of lichenized fungi that it would otherwise be improbable to recognize using classical phenotypic characters. Melanohalea (Parmeliaceae, Ascomycota) is one of the most widespread and common lichen-forming genera in the northern Hemisphere. In this study, we assess traditional phenotype-based species boundaries, identify previously unrecognized species-level lineages and discuss biogeographic patterns in Melanohalea. We sampled 487 individuals worldwide, representing 18 of the 22 described Melanohalea species, and generated DNA sequence data from mitochondrial, nuclear ribosomal, and protein-coding markers. Diversity previously hidden within traditional species was identified using a genealogical concordance approach. We inferred relationships among sampled species-level lineages within Melanohalea using both concatenated phylogenetic methods and a coalescent-based multilocus species tree approach. Although lineages identified from genetic data are largely congruent with traditional taxonomy, we found strong evidence supporting the presence of previously unrecognized species in six of the 18 sampled taxa. Strong nodal support and overall congruence among independent loci suggest long-term reproductive isolation among most species-level lineages. While some Melanohalea taxa are truly widespread, a limited number of clades appear to have much more restricted distributional ranges. In most instances the concatenated gene tree and multilocus species tree approaches provided similar estimates of relationships. However, nodal support was generally higher in the phylogeny estimated from

  16. A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM

    NASA Astrophysics Data System (ADS)

    Nose, Takashi; Kobayashi, Takao

    In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.

  17. Enhancing Peer Feedback and Speech Preparation: The Speech Video Activity

    ERIC Educational Resources Information Center

    Opt, Susan

    2012-01-01

    In the typical public speaking course, instructors or assistants videotape or digitally record at least one of the term's speeches in class or lab to offer students additional presentation feedback. Students often watch and self-critique their speeches on their own. Peers often give only written feedback on classroom presentations or completed…

  18. Recognizing articulatory gestures from speech for robust speech recognition.

    PubMed

    Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol; Saltzman, Elliot; Goldstein, Louis

    2012-03-01

    Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.

  19. IBM MASTOR SYSTEM: Multilingual Automatic Speech-to-speech Translator

    DTIC Science & Technology

    2006-01-01

    IBM MASTOR SYSTEM: Multilingual Automatic Speech-to-speech Translator * Yuqing Gao, Liang Gu, Bowen Zhou, Ruhi Sarikaya, Mohamed Afify , Hong-Kwang...for Improved Discriminative Training,” In Proc. ICASSP, Orlando, 2002. [14] M. Afify et.al, “On the Use of Morphological Analysis for Dialectal

  20. Statistical assessment of speech system performance

    NASA Technical Reports Server (NTRS)

    Moshier, Stephen L.

    1977-01-01

    Methods for the normalization of performance tests results of speech recognition systems are presented. Technological accomplishments in speech recognition systems, as well as planned research activities are described.

  1. Automatic Speech Recognition

    NASA Astrophysics Data System (ADS)

    Potamianos, Gerasimos; Lamel, Lori; Wölfel, Matthias; Huang, Jing; Marcheret, Etienne; Barras, Claude; Zhu, Xuan; McDonough, John; Hernando, Javier; Macho, Dusan; Nadeu, Climent

    Automatic speech recognition (ASR) is a critical component for CHIL services. For example, it provides the input to higher-level technologies, such as summarization and question answering, as discussed in Chapter 8. In the spirit of ubiquitous computing, the goal of ASR in CHIL is to achieve a high performance using far-field sensors (networks of microphone arrays and distributed far-field microphones). However, close-talking microphones are also of interest, as they are used to benchmark ASR system development by providing a best-case acoustic channel scenario to compare against.

  2. Flat-spectrum speech.

    PubMed

    Schroeder, M R; Strube, H W

    1986-05-01

    Flat-spectrum stimuli, consisting of many equal-amplitude harmonics, produce timbre sensations that can depend strongly on the phase angles of the individual harmonics. For fundamental frequencies in the human pitch range, many realizable timbres have vowel-like perceptual qualities. This observation suggests the possibility of constructing intelligible voiced speech signals that have flat-amplitude spectra. This paper describes a successful experiment of creating several different diphthongs by judicious choice of the phase angles of a flat-spectrum waveform. A possible explanation of the observed vowel timbres lies in the dependence of the short-time amplitude spectra on phase changes.

  3. Interpersonal Orientation and Speech Behavior.

    ERIC Educational Resources Information Center

    Street, Richard L., Jr.; Murphy, Thomas L.

    1987-01-01

    Indicates that (1) males with low interpersonal orientation (IO) were least vocally active and expressive and least consistent in their speech performances, and (2) high IO males and low IO females tended to demonstrate greater speech convergence than either low IO males or high IO females. (JD)

  4. Speech Communication: A Radical Doctrine?

    ERIC Educational Resources Information Center

    Haiman, Franklyn S.

    1983-01-01

    Reviews connections between speech communication as a discipline and active commitments to First Amendment principles. Reflects on the influence of Professor James O'Neil, principal founder of the Speech Communication Association and offers his example as a role model. (PD)

  5. SPEECH--MAN'S NATURAL COMMUNICATION.

    ERIC Educational Resources Information Center

    DUDLEY, HOMER; AND OTHERS

    SESSION 63 OF THE 1967 INSTITUTE OF ELECTRICAL AND ELECTRONIC ENGINEERS INTERNATIONAL CONVENTION BROUGHT TOGETHER SEVEN DISTINGUISHED MEN WORKING IN FIELDS RELEVANT TO LANGUAGE. THEIR TOPICS INCLUDED ORIGIN AND EVOLUTION OF SPEECH AND LANGUAGE, LANGUAGE AND CULTURE, MAN'S PHYSIOLOGICAL MECHANISMS FOR SPEECH, LINGUISTICS, AND TECHNOLOGY AND…

  6. Speech Prosody in Cerebellar Ataxia

    ERIC Educational Resources Information Center

    Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.

    2007-01-01

    Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…

  7. Audiovisual Speech Recalibration in Children

    ERIC Educational Resources Information Center

    van Linden, Sabine; Vroomen, Jean

    2008-01-01

    In order to examine whether children adjust their phonetic speech categories, children of two age groups, five-year-olds and eight-year-olds, were exposed to a video of a face saying /aba/ or /ada/ accompanied by an auditory ambiguous speech sound halfway between /b/ and /d/. The effect of exposure to these audiovisual stimuli was measured on…

  8. Speech Analysis Systems: An Evaluation.

    ERIC Educational Resources Information Center

    Read, Charles; And Others

    1992-01-01

    Performance characteristics are reviewed for seven computerized systems marketed for acoustic speech analysis: CSpeech, CSRE, ILS-PC, Kay Elemetrics model 550 Sona-Graph, MacSpeech Lab II, MSL, and Signalyze. Characteristics reviewed include system components, basic capabilities, documentation, user interface, data formats and journaling, and…

  9. Speech acoustics: How much science?

    PubMed

    Tiwari, Manjul

    2012-01-01

    Human vocalizations are sounds made exclusively by a human vocal tract. Among other vocalizations, for example, laughs or screams, speech is the most important. Speech is the primary medium of that supremely human symbolic communication system called language. One of the functions of a voice, perhaps the main one, is to realize language, by conveying some of the speaker's thoughts in linguistic form. Speech is language made audible. Moreover, when phoneticians compare and describe voices, they usually do so with respect to linguistic units, especially speech sounds, like vowels or consonants. It is therefore necessary to understand the structure as well as nature of speech sounds and how they are described. In order to understand and evaluate the speech, it is important to have at least a basic understanding of science of speech acoustics: how the acoustics of speech are produced, how they are described, and how differences, both between speakers and within speakers, arise in an acoustic output. One of the aims of this article is try to facilitate this understanding.

  10. Monaural speech segregation

    NASA Astrophysics Data System (ADS)

    Wang, Deliang; Hu, Guoning

    2003-04-01

    Speech segregation from a monaural recording is a primary task of auditory scene analysis, and has proven to be very challenging. We present a multistage model for the task. The model starts with simulated auditory periphery. A subsequent stage computes midlevel auditory representations, including correlograms and cross-channel correlations. The core of the system performs segmentation and grouping in a two-dimensional time-frequency representation that encodes proximity in frequency and time, periodicity, and amplitude modulation (AM). Motivated by psychoacoustic observations, our system employs different mechanisms for handling resolved and unresolved harmonics. For resolved harmonics, our system generates segments-basic components of an auditory scene-based on temporal continuity and cross-channel correlation, and groups them according to periodicity. For unresolved harmonics, the system generates segments based on AM in addition to temporal continuity and groups them according to AM repetition rates. We derive AM repetition rates using sinusoidal modeling and gradient descent. Underlying the segregation process is a pitch contour that is first estimated from speech segregated according to global pitch and then adjusted according to psychoacoustic constraints. The model has been systematically evaluated, and it yields substantially better performance than previous systems.

  11. Dichotic speech tests.

    PubMed

    Hällgren, M; Johansson, M; Larsby, B; Arlinger, S

    1998-01-01

    When central auditory dysfunction is present, ability to understand speech in difficult listening situations can be affected. To study this phenomenon, dichotic speech tests were performed with test material in the Swedish language. Digits, spondees, sentences and consonant-vowel syllables were used as stimuli and the reporting was free or directed. The test material was recorded on CD. The study includes a normal group of 30 people in three different age categories; 11 years, 23-27 years and 67-70 years. It also includes two different groups of subjects with suspected central auditory lesions; 11 children with reading and writing difficulties and 4 adults earlier exposed to organic solvents. The results from the normal group do not show any differences in performance due to age. The children with reading and writing difficulties show a significant deviation for one test with digits and one test with syllables. Three of the four adults exposed to solvents show a significant deviation from the normal group.

  12. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    PubMed

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues.

  13. Suppressing aliasing noise in the speech feature domain for automatic speech recognition.

    PubMed

    Deng, Huiqun; O'Shaughnessy, Douglas

    2008-07-01

    This letter points out that, although in the audio signal domain low-pass filtering has been used to prevent aliasing noise from entering the baseband of speech signals, an antialias process in the speech feature domain is still needed to prevent high modulation frequency components from entering the baseband of speech features. The existence of aliasing noise in speech features is revealed via spectral analysis of speech feature streams. A method for suppressing such aliasing noise is proposed. Experiments on large vocabulary speech recognition show that antialias processing of speech features can improve speech recognition, especially for noisy speech.

  14. Speech Anxiety: The Importance of Identification in the Basic Speech Course.

    ERIC Educational Resources Information Center

    Mandeville, Mary Y.

    A study investigated speech anxiety in the basic speech course by means of pre and post essays. Subjects, 73 students in 3 classes in the basic speech course at a southwestern multiuniversity, wrote a two-page essay on their perceptions of their speech anxiety before the first speaking project. Students discussed speech anxiety in class and were…

  15. Freedom of Speech Newsletter, February 1976.

    ERIC Educational Resources Information Center

    Allen, Winfred G., Jr., Ed.

    The "Freedom of Speech Newsletter" is the communication medium, published four times each academic year, of the Freedom of Speech Interest Group, Western Speech Communication Association. Articles included in this issue are "What Is Academic Freedom For?" by Ralph Ross, "A Sociology of Free Speech" by Ray Heidt,…

  16. Preschool Children's Awareness of Private Speech

    ERIC Educational Resources Information Center

    Manfra, Louis; Winsler, Adam

    2006-01-01

    The present study explored: (a) preschool children's awareness of their own talking and private speech (speech directed to the self); (b) differences in age, speech use, language ability, and mentalizing abilities between children with awareness and those without; and (c) children's beliefs and attitudes about private speech. Fifty-one children…

  17. Infant Perception of Atypical Speech Signals

    ERIC Educational Resources Information Center

    Vouloumanos, Athena; Gelfand, Hanna M.

    2013-01-01

    The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…

  18. Emerging Technologies Speech Tools and Technologies

    ERIC Educational Resources Information Center

    Godwin-Jones, Robert

    2009-01-01

    Using computers to recognize and analyze human speech goes back at least to the 1970's. Developed initially to help the hearing or speech impaired, speech recognition was also used early on experimentally in language learning. Since the 1990's, advances in the scientific understanding of speech as well as significant enhancements in software and…

  19. ON THE NATURE OF SPEECH SCIENCE.

    ERIC Educational Resources Information Center

    PETERSON, GORDON E.

    IN THIS ARTICLE THE NATURE OF THE DISCIPLINE OF SPEECH SCIENCE IS CONSIDERED AND THE VARIOUS BASIC AND APPLIED AREAS OF THE DISCIPLINE ARE DISCUSSED. THE BASIC AREAS ENCOMPASS THE VARIOUS PROCESSES OF THE PHYSIOLOGY OF SPEECH PRODUCTION, THE ACOUSTICAL CHARACTERISTICS OF SPEECH, INCLUDING THE SPEECH WAVE TYPES AND THE INFORMATION-BEARING ACOUSTIC…

  20. Automated Speech Rate Measurement in Dysarthria

    ERIC Educational Resources Information Center

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  1. Analysis of False Starts in Spontaneous Speech.

    ERIC Educational Resources Information Center

    O'Shaughnessy, Douglas

    A primary difference between spontaneous speech and read speech concerns the use of false starts, where a speaker interrupts the flow of speech to restart his or her utterance. A study examined the acoustic aspects of such restarts in a widely-used speech database, examining approximately 1000 utterances, about 10% of which contained a restart.…

  2. The "Checkers" Speech and Televised Political Communication.

    ERIC Educational Resources Information Center

    Flaningam, Carl

    Richard Nixon's 1952 "Checkers" speech was an innovative use of television for political communication. Like television news itself, the campaign fund crisis behind the speech can be thought of in the same terms as other television melodrama, with the speech serving as its climactic episode. The speech adapted well to television because…

  3. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

    NASA Astrophysics Data System (ADS)

    Lee, Byeongwook; Cho, Kwang-Hyun

    2016-11-01

    Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.

  4. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference.

    PubMed

    Lee, Byeongwook; Cho, Kwang-Hyun

    2016-11-23

    Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.

  5. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

    PubMed Central

    Lee, Byeongwook; Cho, Kwang-Hyun

    2016-01-01

    Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test. PMID:27876875

  6. Mapping acoustics to kinematics in speech

    NASA Astrophysics Data System (ADS)

    Bali, Rohan

    An accurate mapping from speech acoustics to speech articulator movements has many practical applications, as well as theoretical implications of speech planning and perception science. This work can be divided into two parts. In the first part, we show that a simple codebook can be used to map acoustics to speech articulator movements in natural, conversational speech. In the second part, we incorporate cost optimization principles that have been shown to be relevant in motor control tasks into the codebook approach. These cost optimizations are defined as minimization of integral of magnitude velocity, acceleration and jerk of the speech articulators, and are implemented using a dynamic programming technique. Results show that incorporating cost minimization of speech articulator movements can significantly improve mapping acoustics to speech articulator movements. This suggests underlying physiological or neural planning principles used by speech articulators during speech production.

  7. Pronunciation models for conversational speech

    NASA Astrophysics Data System (ADS)

    Johnson, Keith

    2005-09-01

    Using a pronunciation dictionary of clear speech citation forms a segment deletion rate of nearly 12% is found in a corpus of conversational speech. The number of apparent segment deletions can be reduced by constructing a pronunciation dictionary that records one or more of the actual pronunciations found in conversational speech; however, the resulting empirical pronunciation dictionary often fails to include the citation pronunciation form. Issues involved in selecting pronunciations for a dictionary for linguistic, psycholinguistic, and ASR research will be discussed. One conclusion is that Ladefoged may have been the wiser for avoiding the business of producing pronunciation dictionaries. [Supported by NIDCD Grant No. R01 DC04330-03.

  8. Steganalysis of recorded speech

    NASA Astrophysics Data System (ADS)

    Johnson, Micah K.; Lyu, Siwei; Farid, Hany

    2005-03-01

    Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.

  9. Speech recovery device

    SciTech Connect

    Frankle, Christen M.

    2004-04-20

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  10. Speech recovery device

    SciTech Connect

    Frankle, Christen M.

    2000-10-19

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  11. Anxiety and ritualized speech.

    PubMed

    Lalljee, M; Cook, M

    1975-08-01

    The experiment examines the effects on a number of words that seem irrevelant to semantic communication. The Units of Ritualized Speech (URSs) considered are: 'I mean', 'in fact', 'really', 'sort of', 'well', and 'you know'. Two hypotheses are tested: (i) that URS rate will increase with anxiety; and (ii) that the speaker's preferred URS will increase with anxiety. Subjects were interviewed on topics they had previously rated as anxiety-provoking and non-anxiety-provoking. Hypothesis (i) was supported, but hypothesis (ii) was not. More specifically, the use of 'I mean' and 'well' increases when the speaker is anxious. Explanation for this is sought in the grammatical location of these two units. Sex differences in the use of URSs, correlations between URSs and their relationship to other forms of disfluency are also considered.

  12. Deep Ensemble Learning for Monaural Speech Separation

    DTIC Science & Technology

    2015-02-01

    Ensemble Learning for Monaural Speech Separation Xiao-Lei Zhang Department of Computer Science and Engineering The Ohio State University, Columbus...State University, Columbus, OH 43210, USA dwang@cse.ohio-state.edu Abstract – Monaural speech separation is a fundamental problem in robust speech...processing. Recently, deep neural network (DNN) based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have

  13. Applications of Hilbert Spectral Analysis for Speech and Sound Signals

    NASA Technical Reports Server (NTRS)

    Huang, Norden E.

    2003-01-01

    A new method for analyzing nonlinear and nonstationary data has been developed, and the natural applications are to speech and sound signals. The key part of the method is the Empirical Mode Decomposition method with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF is defined as any function having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The IMF also admits well-behaved Hilbert transform. This decomposition method is adaptive, and, therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes. With the Hilbert transform, the Intrinsic Mode Functions yield instantaneous frequencies as functions of time, which give sharp identifications of imbedded structures. This method invention can be used to process all acoustic signals. Specifically, it can process the speech signals for Speech synthesis, Speaker identification and verification, Speech recognition, and Sound signal enhancement and filtering. Additionally, as the acoustical signals from machinery are essentially the way the machines are talking to us. Therefore, the acoustical signals, from the machines, either from sound through air or vibration on the machines, can tell us the operating conditions of the machines. Thus, we can use the acoustic signal to diagnosis the problems of machines.

  14. Writing, Inner Speech, and Meditation.

    ERIC Educational Resources Information Center

    Moffett, James

    1982-01-01

    Examines the interrelationships among meditation, inner speech (stream of consciousness), and writing. Considers the possibilities and implications of using the techniques of meditation in educational settings, especially in the writing classroom. (RL)

  15. Delayed Speech or Language Development

    MedlinePlus

    ... the Classroom What Other Parents Are Reading Your Child's Development (Birth to 3 Years) Feeding Your 1- to ... Possible Problem If you're concerned about your child's speech and language development, there are some things to watch for. An ...

  16. Acute stress reduces speech fluency.

    PubMed

    Buchanan, Tony W; Laures-Gore, Jacqueline S; Duff, Melissa C

    2014-03-01

    People often report word-finding difficulties and other language disturbances when put in a stressful situation. There is, however, scant empirical evidence to support the claim that stress affects speech productivity. To address this issue, we measured speech and language variables during a stressful Trier Social Stress Test (TSST) as well as during a less stressful "placebo" TSST (Het et al., 2009). Compared to the non-stressful speech, participants showed higher word productivity during the TSST. By contrast, participants paused more during the stressful TSST, an effect that was especially pronounced in participants who produced a larger cortisol and heart rate response to the stressor. Findings support anecdotal evidence of stress-impaired speech production abilities.

  17. Speech measures indicating workload demand.

    PubMed

    Brenner, M; Doherty, E T; Shipp, T

    1994-01-01

    Heart rate and six speech measures were evaluated using a manual tracking task under different workload demands. Following training, 17 male subjects performed three task trials: a difficult trial, with a $50 incentive for successful performance at a very demanding level; an easy trial, with a $2 incentive for successful performance at a simple level; and a baseline trial, in which there was physiological monitoring but no tracking performance. Subjects counted aloud during the trials. It was found that heart rate, speaking fundamental frequency (pitch), and vocal intensity (loudness) increased significantly with workload demands. Speaking rate showed a marginal increase, while vocal jitter and vocal shimmer did not show reliable changes. A derived speech measure, which statistically combined information from all other speech measures except shimmer, was also evaluated. It increased significantly with workload demands and was surprisingly robust in showing differences for individual subjects. It appears that speech analysis can provide practical workload information.

  18. Delayed Speech or Language Development

    MedlinePlus

    ... often around 9 months), they begin to string sounds together, incorporate the different tones of speech, and ... of age, babies also should be attentive to sound and begin to recognize names of common objects ( ...

  19. Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech

    NASA Astrophysics Data System (ADS)

    Přibil, J.; Přibilová, A.

    2009-01-01

    The paper addresses reflection of microintonation and spectral properties in male and female acted emotional speech. Microintonation component of speech melody is analyzed regarding its spectral and statistical parameters. According to psychological research of emotional speech, different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of cepstral coefficient values and ranges of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Histograms of cepstral coefficient distribution are evaluated and compared using skewness and kurtosis. Achieved statistical results show good correlation comparing male and female voices for all emotional states portrayed by several Czech and Slovak professional actors.

  20. A maximum likelihood approach to estimating articulator positions from speech acoustics

    SciTech Connect

    Hogden, J.

    1996-09-23

    This proposal presents an algorithm called maximum likelihood continuity mapping (MALCOM) which recovers the positions of the tongue, jaw, lips, and other speech articulators from measurements of the sound-pressure waveform of speech. MALCOM differs from other techniques for recovering articulator positions from speech in three critical respects: it does not require training on measured or modeled articulator positions, it does not rely on any particular model of sound propagation through the vocal tract, and it recovers a mapping from acoustics to articulator positions that is linearly, not topographically, related to the actual mapping from acoustics to articulation. The approach categorizes short-time windows of speech into a finite number of sound types, and assumes the probability of using any articulator position to produce a given sound type can be described by a parameterized probability density function. MALCOM then uses maximum likelihood estimation techniques to: (1) find the most likely smooth articulator path given a speech sample and a set of distribution functions (one distribution function for each sound type), and (2) change the parameters of the distribution functions to better account for the data. Using this technique improves the accuracy of articulator position estimates compared to continuity mapping -- the only other technique that learns the relationship between acoustics and articulation solely from acoustics. The technique has potential application to computer speech recognition, speech synthesis and coding, teaching the hearing impaired to speak, improving foreign language instruction, and teaching dyslexics to read. 34 refs., 7 figs.

  1. Transient noise reduction in speech signal with a modified long-term predictor

    NASA Astrophysics Data System (ADS)

    Choi, Min-Seok; Kang, Hong-Goo

    2011-12-01

    This article proposes an efficient median filter based algorithm to remove transient noise in a speech signal. The proposed algorithm adopts a modified long-term predictor (LTP) as the pre-processor of the noise reduction process to reduce speech distortion caused by the nonlinear nature of the median filter. This article shows that the LTP analysis does not modify to the characteristic of transient noise during the speech modeling process. Oppositely, if a short-term linear prediction (STP) filter is employed as a pre-processor, the enhanced output includes residual noise because the STP analysis and synthesis process keeps and restores transient noise components. To minimize residual noise and speech distortion after the transient noise reduction, a modified LTP method is proposed which estimates the characteristic of speech more accurately. By ignoring transient noise presence regions in the pitch lag detection step, the modified LTP successfully avoids being affected by transient noise. A backward pitch prediction algorithm is also adopted to reduce speech distortion in the onset regions. Experimental results verify that the proposed system efficiently eliminates transient noise while preserving desired speech signal.

  2. New Ideas for Speech Recognition and Related Technologies

    SciTech Connect

    Holzrichter, J F

    2002-06-17

    The ideas relating to the use of organ motion sensors for the purposes of speech recognition were first described by.the author in spring 1994. During the past year, a series of productive collaborations between the author, Tom McEwan and Larry Ng ensued and have lead to demonstrations, new sensor ideas, and algorithmic descriptions of a large number of speech recognition concepts. This document summarizes the basic concepts of recognizing speech once organ motions have been obtained. Micro power radars and their uses for the measurement of body organ motions, such as those of the heart and lungs, have been demonstrated by Tom McEwan over the past two years. McEwan and I conducted a series of experiments, using these instruments, on vocal organ motions beginning in late spring, during which we observed motions of vocal folds (i.e., cords), tongue, jaw, and related organs that are very useful for speech recognition and other purposes. These will be reviewed in a separate paper. Since late summer 1994, Lawrence Ng and I have worked to make many of the initial recognition ideas more rigorous and to investigate the applications of these new ideas to new speech recognition algorithms, to speech coding, and to speech synthesis. I introduce some of those ideas in section IV of this document, and we describe them more completely in the document following this one, UCRL-UR-120311. For the design and operation of micro-power radars and their application to body organ motions, the reader may contact Tom McEwan directly. The capability for using EM sensors (i.e., radar units) to measure body organ motions and positions has been available for decades. Impediments to their use appear to have been size, excessive power, lack of resolution, and lack of understanding of the value of organ motion measurements, especially as applied to speech related technologies. However, with the invention of very low power, portable systems as demonstrated by McEwan at LLNL researchers have begun

  3. The interlanguage speech intelligibility benefit

    NASA Astrophysics Data System (ADS)

    Bent, Tessa; Bradlow, Ann R.

    2003-09-01

    This study investigated how native language background influences the intelligibility of speech by non-native talkers for non-native listeners from either the same or a different native language background as the talker. Native talkers of Chinese (n=2), Korean (n=2), and English (n=1) were recorded reading simple English sentences. Native listeners of English (n=21), Chinese (n=21), Korean (n=10), and a mixed group from various native language backgrounds (n=12) then performed a sentence recognition task with the recordings from the five talkers. Results showed that for native English listeners, the native English talker was most intelligible. However, for non-native listeners, speech from a relatively high proficiency non-native talker from the same native language background was as intelligible as speech from a native talker, giving rise to the ``matched interlanguage speech intelligibility benefit.'' Furthermore, this interlanguage intelligibility benefit extended to the situation where the non-native talker and listeners came from different language backgrounds, giving rise to the ``mismatched interlanguage speech intelligibility benefit.'' These findings shed light on the nature of the talker-listener interaction during speech communication.

  4. Automatic testing of speech recognition.

    PubMed

    Francart, Tom; Moonen, Marc; Wouters, Jan

    2009-02-01

    Speech reception tests are commonly administered by manually scoring the oral response of the subject. This requires a test supervisor to be continuously present. To avoid this, a subject can type the response, after which it can be scored automatically. However, spelling errors may then be counted as recognition errors, influencing the test results. We demonstrate an autocorrection approach based on two scoring algorithms to cope with spelling errors. The first algorithm deals with sentences and is based on word scores. The second algorithm deals with single words and is based on phoneme scores. Both algorithms were evaluated with a corpus of typed answers based on three different Dutch speech materials. The percentage of differences between automatic and manual scoring was determined, in addition to the mean difference in speech recognition threshold. The sentence correction algorithm performed at a higher accuracy than commonly obtained with these speech materials. The word correction algorithm performed better than the human operator. Both algorithms can be used in practice and allow speech reception tests with open set speech materials over the internet.

  5. Concatenated logic circuits based on a three-way DNA junction: a keypad-lock security system with visible readout and an automatic reset function.

    PubMed

    Chen, Junhua; Zhou, Shungui; Wen, Junlin

    2015-01-07

    Concatenated logic circuits operating as a biocomputing keypad-lock security system with an automatic reset function have been successfully constructed on the basis of toehold-mediated strand displacement and three-way-DNA-junction architecture. In comparison with previously reported keypad locks, the distinctive advantage of the proposed security system is that it can be reset and cycled spontaneously a large number of times without an external stimulus, thus making practical applications possible. By the use of a split-G-quadruplex DNAzyme as the signal reporter, the output of the keypad lock can be recognized readily by the naked eye. The "lock" is opened only when the inputs are introduced in an exact order. This requirement provides defense against illegal invasion to protect information at the molecular scale.

  6. Phylogeny of the cycads based on multiple single-copy nuclear genes: congruence of concatenated parsimony, likelihood and species tree inference methods

    PubMed Central

    Salas-Leiva, Dayana E.; Meerow, Alan W.; Calonje, Michael; Griffith, M. Patrick; Francisco-Ortega, Javier; Nakamura, Kyoko; Stevenson, Dennis W.; Lewis, Carl E.; Namoff, Sandra

    2013-01-01

    Background and aims Despite a recent new classification, a stable phylogeny for the cycads has been elusive, particularly regarding resolution of Bowenia, Stangeria and Dioon. In this study, five single-copy nuclear genes (SCNGs) are applied to the phylogeny of the order Cycadales. The specific aim is to evaluate several gene tree–species tree reconciliation approaches for developing an accurate phylogeny of the order, to contrast them with concatenated parsimony analysis and to resolve the erstwhile problematic phylogenetic position of these three genera. Methods DNA sequences of five SCNGs were obtained for 20 cycad species representing all ten genera of Cycadales. These were analysed with parsimony, maximum likelihood (ML) and three Bayesian methods of gene tree–species tree reconciliation, using Cycas as the outgroup. A calibrated date estimation was developed with Bayesian methods, and biogeographic analysis was also conducted. Key Results Concatenated parsimony, ML and three species tree inference methods resolve exactly the same tree topology with high support at most nodes. Dioon and Bowenia are the first and second branches of Cycadales after Cycas, respectively, followed by an encephalartoid clade (Macrozamia–Lepidozamia–Encephalartos), which is sister to a zamioid clade, of which Ceratozamia is the first branch, and in which Stangeria is sister to Microcycas and Zamia. Conclusions A single, well-supported phylogenetic hypothesis of the generic relationships of the Cycadales is presented. However, massive extinction events inferred from the fossil record that eliminated broader ancestral distributions within Zamiaceae compromise accurate optimization of ancestral biogeographical areas for that hypothesis. While major lineages of Cycadales are ancient, crown ages of all modern genera are no older than 12 million years, supporting a recent hypothesis of mostly Miocene radiations. This phylogeny can contribute to an accurate infrafamilial

  7. Elderly perception of speech from a computer

    NASA Astrophysics Data System (ADS)

    Black, Alan; Eskenazi, Maxine; Simmons, Reid

    2002-05-01

    An aging population still needs to access information, such as bus schedules. It is evident that they will be doing so using computers and especially interfaces using speech input and output. This is a preliminary study to the use of synthetic speech for the elderly. In it twenty persons between the ages of 60 and 80 were asked to listen to speech emitted by a robot (CMU's VIKIA) and to write down what they heard. All of the speech was natural prerecorded speech (not synthetic) read by one female speaker. There were four listening conditions: (a) only speech emitted, (b) robot moves before emitting speech, (c) face has lip movement during speech, (d) both (b) and (c). There were very few errors for conditions (b), (c), and (d), but errors existed for condition (a). The presentation will discuss experimental conditions, show actual figures and try to draw conclusions for speech communication between computers and the elderly.

  8. Speech recognition by computer. 1964-September 1981 (a bibliography with abstracts)

    SciTech Connect

    Not Available

    1983-02-01

    The cited reports present investigations on the recognition, synthesis, and processing of speech by computer. The research includes the acoustical, phonological, and linguistic processes necessary in the conversion of the various waveforms by computers. (This updated bibliography contains 294 citations, none of which are new entries to the previous edition.)

  9. Speech recognition by computer. October 1981-1982 (a bibliography with abstracts)

    SciTech Connect

    Not Available

    1983-02-01

    The cited reports present investigations on the recognition, synthesis, and processing of speech by computer. The research includes the acoustical, phonological, and linguistic processes necessary in the conversion of the various waveforms by computers. (This updated bibliography contains 33 citations, all of which are new entries to the previous edition.)

  10. Effect of Speaking Rate on Recognition of Synthetic and Natural Speech by Normal-Hearing and Cochlear Implant Listeners

    PubMed Central

    Ji, Caili; Galvin, John J.; Xu, Anting; Fu, Qian-Jie

    2012-01-01

    Objective Most studies have evaluated cochlear implant (CI) performance using “clear” speech materials, which are highly intelligible and well-articulated. CI users may encounter much greater variability in speech patterns in the “real-world,” including synthetic speech. In this study, we measured normal-hearing (NH) and CI listeners’ sentence recognition with multiple talkers and speaking rates, and with naturally produced and synthetic speech. Design NH and CI subjects were asked to recognize naturally produced or synthetic sentences, presented at a slow, normal, or fast speaking rate. Natural speech was produced by one male and one female talker; synthetic speech was generated to simulate a male and female talker. For natural speech, the speaking rate was time-scaled while preserving voice pitch and formant frequency information. For synthetic speech, the speaking rate was adjusted within the speech synthesis engine. NH subjects were tested while listening to unprocessed speech or to an 8-channel acoustic CI simulation. CI subjects were tested while listening with their clinical processors and the recommended microphone sensitivity and volume settings. Results The NH group performed significantly better than the CI simulation group, and the CI simulation group performed significantly better than the CI group. For all subject groups, sentence recognition was significantly better with natural than with synthetic speech. The performance deficit with synthetic speech was relatively small for NH subjects listening to unprocessed speech. However, the performance deficit with synthetic speech was much greater for CI subjects and for CI simulation subjects. There was significant effect of talker gender, with slightly better performance with the female talker for CI subjects and slightly better performance with the male talker for the CI simulations. For all subject groups, sentence recognition was significantly poorer only at the fast rate. CI performance was

  11. Neural pathways for visual speech perception

    PubMed Central

    Bernstein, Lynne E.; Liebenthal, Einat

    2014-01-01

    This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611

  12. Child directed speech, speech in noise and hyperarticulated speech in the Pacific Northwest

    NASA Astrophysics Data System (ADS)

    Wright, Richard; Carmichael, Lesley; Beckford Wassink, Alicia; Galvin, Lisa

    2004-05-01

    Three types of exaggerated speech are thought to be systematic responses to accommodate the needs of the listener: child-directed speech (CDS), hyperspeech, and the Lombard response. CDS (e.g., Kuhl et al., 1997) occurs in interactions with young children and infants. Hyperspeech (Johnson et al., 1993) is a modification in response to listeners difficulties in recovering the intended message. The Lombard response (e.g., Lane et al., 1970) is a compensation for increased noise in the signal. While all three result from adaptations to accommodate the needs of the listener, and therefore should share some features, the triggering conditions are quite different, and therefore should exhibit differences in their phonetic outcomes. While CDS has been the subject of a variety of acoustic studies, it has never been studied in the broader context of the other ``exaggerated'' speech styles. A large crosslinguistic study was undertaken that compares speech produced under four conditions: spontaneous conversations, CDS aimed at 6-9-month-old infants, hyperarticulated speech, and speech in noise. This talk will present some findings for North American English as spoken in the Pacific Northwest. The measures include f0, vowel duration, F1 and F2 at vowel midpoint, and intensity.

  13. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    PubMed

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.

  14. Speech prosody in cerebellar ataxia

    NASA Astrophysics Data System (ADS)

    Casper, Maureen

    The present study sought an acoustic signature for the speech disturbance recognized in cerebellar degeneration. Magnetic resonance imaging was used for a radiological rating of cerebellar involvement in six cerebellar ataxic dysarthric speakers. Acoustic measures of the [pap] syllables in contrastive prosodic conditions and of normal vs. brain-damaged patients were used to further our understanding both of the speech degeneration that accompanies cerebellar pathology and of speech motor control and movement in general. Pair-wise comparisons of the prosodic conditions within the normal group showed statistically significant differences for four prosodic contrasts. For three of the four contrasts analyzed, the normal speakers showed both longer durations and higher formant and fundamental frequency values in the more prominent first condition of the contrast. The acoustic measures of the normal prosodic contrast values were then used as a model to measure the degree of speech deterioration for individual cerebellar subjects. This estimate of speech deterioration as determined by individual differences between cerebellar and normal subjects' acoustic values of the four prosodic contrasts was used in correlation analyses with MRI ratings. Moderate correlations between speech deterioration and cerebellar atrophy were found in the measures of syllable duration and f0. A strong negative correlation was found for F1. Moreover, the normal model presented by these acoustic data allows for a description of the flexibility of task- oriented behavior in normal speech motor control. These data challenge spatio-temporal theory which explains movement as an artifact of time wherein longer durations predict more extreme movements and give further evidence for gestural internal dynamics of movement in which time emerges from articulatory events rather than dictating those events. This model provides a sensitive index of cerebellar pathology with quantitative acoustic

  15. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  16. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  17. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  18. Production and perception of clear speech

    NASA Astrophysics Data System (ADS)

    Bradlow, Ann R.

    2003-04-01

    When a talker believes that the listener is likely to have speech perception difficulties due to a hearing loss, background noise, or a different native language, she or he will typically adopt a clear speaking style. Previous research has established that, with a simple set of instructions to the talker, ``clear speech'' can be produced by most talkers under laboratory recording conditions. Furthermore, there is reliable evidence that adult listeners with either impaired or normal hearing typically find clear speech more intelligible than conversational speech. Since clear speech production involves listener-oriented articulatory adjustments, a careful examination of the acoustic-phonetic and perceptual consequences of the conversational-to-clear speech transformation can serve as an effective window into talker- and listener-related forces in speech communication. Furthermore, clear speech research has considerable potential for the development of speech enhancement techniques. After reviewing previous and current work on the acoustic properties of clear versus conversational speech, this talk will present recent data from a cross-linguistic study of vowel production in clear speech and a cross-population study of clear speech perception. Findings from these studies contribute to an evolving view of clear speech production and perception as reflecting both universal, auditory and language-specific, phonological contrast enhancement features.

  19. Performance Pressure Enhances Speech Learning

    PubMed Central

    Maddox, W. Todd; Koslov, Seth; Yi, Han-Gyol; Chandrasekaran, Bharath

    2015-01-01

    Real-world speech learning often occurs in high pressure situations such as trying to communicate in a foreign country. However, the impact of pressure on speech learning success is largely unexplored. In this study, adult, native speakers of English learned non-native speech categories under pressure or no-pressure conditions. In the pressure conditions, participants were informed that they were paired with a (fictitious) partner, and that each had to independently exceed a performance criterion for both to receive a monetary bonus. They were then informed that their partner had exceeded the bonus and the fate of both bonuses depended upon the participant’s performance. Our results demonstrate that pressure significantly enhanced speech learning success. In addition, neurobiologically-inspired computational modeling revealed that the performance advantage was due to faster and more frequent use of procedural learning strategies. These results integrate two well-studied research domains and suggest a facilitatory role of motivational factors in speech learning performance that may not be captured in traditional training paradigms. PMID:28077883

  20. The Effect of Speech Rate on Stuttering Frequency, Phonated Intervals, Speech Effort, and Speech Naturalness during Chorus Reading

    ERIC Educational Resources Information Center

    Davidow, Jason H.; Ingham, Roger J.

    2013-01-01

    Purpose: This study examined the effect of speech rate on phonated intervals (PIs), in order to test whether a reduction in the frequency of short PIs is an important part of the fluency-inducing mechanism of chorus reading. The influence of speech rate on stuttering frequency, speaker-judged speech effort, and listener-judged naturalness was also…

  1. Visual Context Enhanced: The Joint Contribution of Iconic Gestures and Visible Speech to Degraded Speech Comprehension

    ERIC Educational Resources Information Center

    Drijvers, Linda; Ozyurek, Asli

    2017-01-01

    Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method:…

  2. The Role of Visual Speech Information in Supporting Perceptual Learning of Degraded Speech

    ERIC Educational Resources Information Center

    Wayne, Rachel V.; Johnsrude, Ingrid S.

    2012-01-01

    Following cochlear implantation, hearing-impaired listeners must adapt to speech as heard through their prosthesis. Visual speech information (VSI; the lip and facial movements of speech) is typically available in everyday conversation. Here, we investigate whether learning to understand a popular auditory simulation of speech as transduced by a…

  3. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    ERIC Educational Resources Information Center

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  4. The Fragile Nature of the Speech-Perception Deficit in Dyslexia: Natural vs. Synthetic Speech

    ERIC Educational Resources Information Center

    Blomert, Leo; Mitterer, Holger

    2004-01-01

    A number of studies reported that developmental dyslexics are impaired in speech perception, especially for speech signals consisting of rapid auditory transitions. These studies mostly made use of a categorical-perception task with synthetic-speech samples. In this study, we show that deficits in the perception of synthetic speech do not…

  5. Perceived Liveliness and Speech Comprehensibility in Aphasia: The Effects of Direct Speech in Auditory Narratives

    ERIC Educational Resources Information Center

    Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike

    2014-01-01

    Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in "healthy" communication direct speech constructions contribute to the liveliness, and indirectly to the comprehensibility, of speech.…

  6. What Is Language? What Is Speech?

    MedlinePlus

    ... request did not produce results) Speech is the verbal means of communicating. Speech consists of the following: ... questions and requests for information from members and non-members. Available 8:30 a.m.–5:00 ...

  7. President Kennedy's Speech at Rice University

    NASA Technical Reports Server (NTRS)

    1988-01-01

    This video tape presents unedited film footage of President John F. Kennedy's speech at Rice University, Houston, Texas, September 12, 1962. The speech expresses the commitment of the United States to landing an astronaut on the Moon.

  8. Nonlinear Statistical Modeling of Speech

    NASA Astrophysics Data System (ADS)

    Srinivasan, S.; Ma, T.; May, D.; Lazarou, G.; Picone, J.

    2009-12-01

    Contemporary approaches to speech and speaker recognition decompose the problem into four components: feature extraction, acoustic modeling, language modeling and search. Statistical signal processing is an integral part of each of these components, and Bayes Rule is used to merge these components into a single optimal choice. Acoustic models typically use hidden Markov models based on Gaussian mixture models for state output probabilities. This popular approach suffers from an inherent assumption of linearity in speech signal dynamics. Language models often employ a variety of maximum entropy techniques, but can employ many of the same statistical techniques used for acoustic models. In this paper, we focus on introducing nonlinear statistical models to the feature extraction and acoustic modeling problems as a first step towards speech and speaker recognition systems based on notions of chaos and strange attractors. Our goal in this work is to improve the generalization and robustness properties of a speech recognition system. Three nonlinear invariants are proposed for feature extraction: Lyapunov exponents, correlation fractal dimension, and correlation entropy. We demonstrate an 11% relative improvement on speech recorded under noise-free conditions, but show a comparable degradation occurs for mismatched training conditions on noisy speech. We conjecture that the degradation is due to difficulties in estimating invariants reliably from noisy data. To circumvent these problems, we introduce two dynamic models to the acoustic modeling problem: (1) a linear dynamic model (LDM) that uses a state space-like formulation to explicitly model the evolution of hidden states using an autoregressive process, and (2) a data-dependent mixture of autoregressive (MixAR) models. Results show that LDM and MixAR models can achieve comparable performance with HMM systems while using significantly fewer parameters. Currently we are developing Bayesian parameter estimation and

  9. Speech and Hearing Science, Anatomy and Physiology.

    ERIC Educational Resources Information Center

    Zemlin, Willard R.

    Written for those interested in speech pathology and audiology, the text presents the anatomical, physiological, and neurological bases for speech and hearing. Anatomical nomenclature used in the speech and hearing sciences is introduced and the breathing mechanism is defined and discussed in terms of the respiratory passage, the framework and…

  10. DEVELOPMENT AND DISORDERS OF SPEECH IN CHILDHOOD.

    ERIC Educational Resources Information Center

    KARLIN, ISAAC W.; AND OTHERS

    THE GROWTH, DEVELOPMENT, AND ABNORMALITIES OF SPEECH IN CHILDHOOD ARE DESCRIBED IN THIS TEXT DESIGNED FOR PEDIATRICIANS, PSYCHOLOGISTS, EDUCATORS, MEDICAL STUDENTS, THERAPISTS, PATHOLOGISTS, AND PARENTS. THE NORMAL DEVELOPMENT OF SPEECH AND LANGUAGE IS DISCUSSED, INCLUDING THEORIES ON THE ORIGIN OF SPEECH IN MAN AND FACTORS INFLUENCING THE NORMAL…

  11. Audiovisual Asynchrony Detection in Human Speech

    ERIC Educational Resources Information Center

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  12. Liberalism, Speech Codes, and Related Problems.

    ERIC Educational Resources Information Center

    Sunstein, Cass R.

    1993-01-01

    It is argued that universities are pervasively and necessarily engaged in regulation of speech, which complicates many existing claims about hate speech codes on campus. The ultimate test is whether the restriction on speech is a legitimate part of the institution's mission, commitment to liberal education. (MSE)

  13. Interventions for Speech Sound Disorders in Children

    ERIC Educational Resources Information Center

    Williams, A. Lynn, Ed.; McLeod, Sharynne, Ed.; McCauley, Rebecca J., Ed.

    2010-01-01

    With detailed discussion and invaluable video footage of 23 treatment interventions for speech sound disorders (SSDs) in children, this textbook and DVD set should be part of every speech-language pathologist's professional preparation. Focusing on children with functional or motor-based speech disorders from early childhood through the early…

  14. Theoretical Value in Teaching Freedom of Speech.

    ERIC Educational Resources Information Center

    Carney, John J., Jr.

    The exercise of freedom of speech within our nation has deteriorated. A practical value in teaching free speech is the possibility of restoring a commitment to its principles by educators. What must be taught is why freedom of speech is important, why it has been compromised, and the extent to which it has been compromised. Every technological…

  15. Improving Speech Production with Adolescents and Adults.

    ERIC Educational Resources Information Center

    Whitehead, Brenda H.; Barefoot, Sidney M.

    1992-01-01

    This paper deals with the specific problems of the adolescent and adult hearing-impaired individual who wishes to improve and develop his or her expressive speech ability. Considered are issues critical to the learning process, intervention strategies for improving speech production, and speech production as one part of communication competency.…

  16. Speech and Debate as Civic Education

    ERIC Educational Resources Information Center

    Hogan, J. Michael; Kurr, Jeffrey A.; Johnson, Jeremy D.; Bergmaier, Michael J.

    2016-01-01

    In light of the U.S. Senate's designation of March 15, 2016 as "National Speech and Debate Education Day" (S. Res. 398, 2016), it only seems fitting that "Communication Education" devote a special section to the role of speech and debate in civic education. Speech and debate have been at the heart of the communication…

  17. Hate Speech and the First Amendment.

    ERIC Educational Resources Information Center

    Rainey, Susan J.; Kinsler, Waren S.; Kannarr, Tina L.; Reaves, Asa E.

    This document is comprised of California state statutes, federal legislation, and court litigation pertaining to hate speech and the First Amendment. The document provides an overview of California education code sections relating to the regulation of speech; basic principles of the First Amendment; government efforts to regulate hate speech,…

  18. Communicating by Language: The Speech Process.

    ERIC Educational Resources Information Center

    House, Arthur S., Ed.

    This document reports on a conference focused on speech problems. The main objective of these discussions was to facilitate a deeper understanding of human communication through interaction of conference participants with colleagues in other disciplines. Topics discussed included speech production, feedback, speech perception, and development of…

  19. Towards Multilingual Interoperability in Automatic Speech Recognition

    DTIC Science & Technology

    2000-08-01

    UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP010388 TITLE: Towards Multilingual Interoperability in Automatic Speech...component part numbers comprise the compilation report: ADPO10378 thru ADPO10397 UNCLASSIFIED 69 TOWARDS MULTILINGUAL INTEROPERABILITY IN AUTOMATIC SPEECH...communication, we address multilingual interoperability (DARPA) [39, 5, 12, 40, 14, 43]. aspects in speech recognition. After giving a tentative

  20. Freedom of Speech as an Academic Discipline.

    ERIC Educational Resources Information Center

    Haiman, Franklyn S.

    Since its formation, the Speech Communication Association's Committee on Freedom of Speech has played a critical leadership role in course offerings, research efforts, and regional activities in freedom of speech. Areas in which research has been done and in which further research should be carried out include: historical-critical research, in…

  1. The Varieties of Speech to Young Children

    ERIC Educational Resources Information Center

    Huttenlocher, Janellen; Vasilyeva, Marina; Waterfall, Heidi R.; Vevea, Jack L.; Hedges, Larry V.

    2007-01-01

    This article examines caregiver speech to young children. The authors obtained several measures of the speech used to children during early language development (14-30 months). For all measures, they found substantial variation across individuals and subgroups. Speech patterns vary with caregiver education, and the differences are maintained over…

  2. Recovering Asynchronous Watermark Tones from Speech

    DTIC Science & Technology

    2009-04-01

    Audio steganography for covert data transmission by impercep- tible tone insertion,” Proceedings Communications Sys- tems and Applications, IEEE, vol. 4, pp. 1647–1653, 2004. 1408 ...by a comfortable margin. Index Terms— Speech Watermarking, Hidden Tones, Speech Steganography , Speech Data Hiding 1. BACKGROUND Imperceptibly

  3. Acoustics of Clear Speech: Effect of Instruction

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris; Wilding, Greg

    2012-01-01

    Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…

  4. Speech Perception in Individuals with Auditory Neuropathy

    ERIC Educational Resources Information Center

    Zeng, Fan-Gang; Liu, Sheng

    2006-01-01

    Purpose: Speech perception in participants with auditory neuropathy (AN) was systematically studied to answer the following 2 questions: Does noise present a particular problem for people with AN: Can clear speech and cochlear implants alleviate this problem? Method: The researchers evaluated the advantage in intelligibility of clear speech over…

  5. The Dynamic Nature of Speech Perception

    ERIC Educational Resources Information Center

    McQueen, James M.; Norris, Dennis; Cutler, Anne

    2006-01-01

    The speech perception system must be flexible in responding to the variability in speech sounds caused by differences among speakers and by language change over the lifespan of the listener. Indeed, listeners use lexical knowledge to retune perception of novel speech (Norris, McQueen, & Cutler, 2003). In that study, Dutch listeners made…

  6. Nebraska Speech, Debate, and Drama Manuals.

    ERIC Educational Resources Information Center

    Nebraska School Activities Association, Lincoln.

    Prepared and designed to provide general information in the administration of speech activities in the Nebraska schools, this manual offers rules and regulations for speech events, high school debate, and one act plays. The section on speech events includes information about general regulations, the scope of competition, district contests, the…

  7. Cognitive Functions in Childhood Apraxia of Speech

    ERIC Educational Resources Information Center

    Nijland, Lian; Terband, Hayo; Maassen, Ben

    2015-01-01

    Purpose: Childhood apraxia of speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional problems. Method: Cognitive functions were investigated…

  8. Speech, the Alphabet, and Teaching to Read.

    ERIC Educational Resources Information Center

    Liberman, Isabelle Y.; Shankweiler, Donald

    The dependence of reading on speech is based on three assumptions: speech is the primary language system, acquired naturally without direct instruction; alphabetic writing systems are more or less phonetic representations of oral language; and speech appears to be an essential foundation for the acquisition of reading ability. By presupposing…

  9. Campus Speech Codes Said to Violate Rights

    ERIC Educational Resources Information Center

    Lipka, Sara

    2007-01-01

    Most college and university speech codes would not survive a legal challenge, according to a report released in December by the Foundation for Individual Rights in Education, a watchdog group for free speech on campuses. The report labeled many speech codes as overly broad or vague, and cited examples such as Furman University's prohibition of…

  10. Audiovisual Speech Integration and Lipreading in Autism

    ERIC Educational Resources Information Center

    Smith, Elizabeth G.; Bennetto, Loisa

    2007-01-01

    Background: During speech perception, the ability to integrate auditory and visual information causes speech to sound louder and be more intelligible, and leads to quicker processing. This integration is important in early language development, and also continues to affect speech comprehension throughout the lifespan. Previous research shows that…

  11. Auditory models for speech analysis

    NASA Astrophysics Data System (ADS)

    Maybury, Mark T.

    This paper reviews the psychophysical basis for auditory models and discusses their application to automatic speech recognition. First an overview of the human auditory system is presented, followed by a review of current knowledge gleaned from neurological and psychoacoustic experimentation. Next, a general framework describes established peripheral auditory models which are based on well-understood properties of the peripheral auditory system. This is followed by a discussion of current enhancements to that models to include nonlinearities and synchrony information as well as other higher auditory functions. Finally, the initial performance of auditory models in the task of speech recognition is examined and additional applications are mentioned.

  12. Perception of Speech Reflects Optimal Use of Probabilistic Speech Cues

    ERIC Educational Resources Information Center

    Clayards, Meghan; Tanenhaus, Michael K.; Aslin, Richard N.; Jacobs, Robert A.

    2008-01-01

    Listeners are exquisitely sensitive to fine-grained acoustic detail within phonetic categories for sounds and words. Here we show that this sensitivity is optimal given the probabilistic nature of speech cues. We manipulated the probability distribution of one probabilistic cue, voice onset time (VOT), which differentiates word initial labial…

  13. Relationship between Speech Intelligibility and Speech Comprehension in Babble Noise

    ERIC Educational Resources Information Center

    Fontan, Lionel; Tardieu, Julien; Gaillard, Pascal; Woisard, Virginie; Ruiz, Robert

    2015-01-01

    Purpose: The authors investigated the relationship between the intelligibility and comprehension of speech presented in babble noise. Method: Forty participants listened to French imperative sentences (commands for moving objects) in a multitalker babble background for which intensity was experimentally controlled. Participants were instructed to…

  14. Speech entrainment enables patients with Broca's aphasia to produce fluent speech.

    PubMed

    Fridriksson, Julius; Hubbard, H Isabel; Hudspeth, Sarah Grace; Holland, Audrey L; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-12-01

    A distinguishing feature of Broca's aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect 'speech entrainment' and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca's aphasia. In Experiment 1, 13 patients with Broca's aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca's area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and

  15. Pulse Vector-Excitation Speech Encoder

    NASA Technical Reports Server (NTRS)

    Davidson, Grant; Gersho, Allen

    1989-01-01

    Proposed pulse vector-excitation speech encoder (PVXC) encodes analog speech signals into digital representation for transmission or storage at rates below 5 kilobits per second. Produces high quality of reconstructed speech, but with less computation than required by comparable speech-encoding systems. Has some characteristics of multipulse linear predictive coding (MPLPC) and of code-excited linear prediction (CELP). System uses mathematical model of vocal tract in conjunction with set of excitation vectors and perceptually-based error criterion to synthesize natural-sounding speech.

  16. Identifying Deceptive Speech Across Cultures

    DTIC Science & Technology

    2016-06-25

    AFRL-AFOSR-VA-TR-2016-0267 IDENTIFYING DECEPTIVE SPEECH ACROSS CULTURES Julia Hirschberg THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK...PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK SPONSORED PROJECTS ADMINISTRATION 8

  17. Sociolinguistic Factors in Speech Identification.

    ERIC Educational Resources Information Center

    Shuy, Roger W.; And Others

    The first of two experiments conducted in Detroit investigated the relationship between class and ethnic membership and identification of class and ethnicity; the role age and sex of respondent play in accuracy of speaker identification; and attitudes toward various socioethnic speech patterns. The second study was concerned with the attitudes of…

  18. Speech and Language Developmental Milestones

    MedlinePlus

    ... What are the milestones for speech and language development? The first signs of communication occur when an infant learns that a cry will bring food, comfort, and companionship. Newborns also begin to recognize important sounds in their environment, such as the voice of their mother or ...

  19. "Free Speech" and "Political Correctness"

    ERIC Educational Resources Information Center

    Scott, Peter

    2016-01-01

    "Free speech" and "political correctness" are best seen not as opposing principles, but as part of a spectrum. Rather than attempting to establish some absolute principles, this essay identifies four trends that impact on this debate: (1) there are, and always have been, legitimate debates about the--absolute--beneficence of…

  20. Embedding speech into virtual realities

    NASA Technical Reports Server (NTRS)

    Bohn, Christian-Arved; Krueger, Wolfgang

    1993-01-01

    In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

  1. Neural Networks for Speech Application.

    DTIC Science & Technology

    1987-11-01

    This is a general introduction to the reemerging technology called neural networks , and how these networks may provide an important alternative to...traditional forms of computing in speech applications. Neural networks , sometimes called Artificial Neural Systems (ANS), have shown promise for solving

  2. Aerosol Emission during Human Speech

    NASA Astrophysics Data System (ADS)

    Asadi, Sima; Ristenpart, William

    2016-11-01

    The traditional emphasis for airborne disease transmission has been on coughing and sneezing, which are dramatic expiratory events that yield easily visible droplets. Recent research suggests that normal speech can release even larger quantities of aerosols that are too small to see with the naked eye, but are nonetheless large enough to carry a variety of pathogens (e.g., influenza A). This observation raises an important question: what types of speech emit the most aerosols? Here we show that the concentration of aerosols emitted during healthy human speech is positively correlated with both the amplitude (loudness) and fundamental frequency (pitch) of the vocalization. Experimental measurements with an aerodynamic particle sizer (APS) indicate that speaking in a loud voice (95 decibels) yields up to fifty times more aerosols than in a quiet voice (75 decibels), and that sounds associated with certain phonemes (e.g., [a] or [o]) release more aerosols than others. We interpret these results in terms of the egressive airflow rate associated with each phoneme and the corresponding fundamental frequency, which is known to vary significantly with gender and age. The results suggest that individual speech patterns could affect the probability of airborne disease transmission.

  3. Speech Registers in Young Children.

    ERIC Educational Resources Information Center

    Weeks, Thelma E.

    This study of child language acquisition concerns various structural and paralinguistic features of language and examines their role in the total language acquisition process. The informants were three children (two boys and one girl) aged five years, two months; three years, four months; and one year, nine months. Their speech was recorded over a…

  4. Speech Research. Interim Scientific Report.

    ERIC Educational Resources Information Center

    Cooper, Franklin S.

    The status and progress of several studies dealing with the nature of speech, instrumentation for its investigation, and instrumentation for practical applications is reported on. The period of January 1 through June 30, 1969 is covered. Extended reports and manuscripts cover the following topics: programing for the Glace-Holmes synthesizer,…

  5. Acoustic Analysis of PD Speech

    PubMed Central

    Chenausky, Karen; MacAuslan, Joel; Goldhor, Richard

    2011-01-01

    According to the U.S. National Institutes of Health, approximately 500,000 Americans have Parkinson's disease (PD), with roughly another 50,000 receiving new diagnoses each year. 70%–90% of these people also have the hypokinetic dysarthria associated with PD. Deep brain stimulation (DBS) substantially relieves motor symptoms in advanced-stage patients for whom medication produces disabling dyskinesias. This study investigated speech changes as a result of DBS settings chosen to maximize motor performance. The speech of 10 PD patients and 12 normal controls was analyzed for syllable rate and variability, syllable length patterning, vowel fraction, voice-onset time variability, and spirantization. These were normalized by the controls' standard deviation to represent distance from normal and combined into a composite measure. Results show that DBS settings relieving motor symptoms can improve speech, making it up to three standard deviations closer to normal. However, the clinically motivated settings evaluated here show greater capacity to impair, rather than improve, speech. A feedback device developed from these findings could be useful to clinicians adjusting DBS parameters, as a means for ensuring they do not unwittingly choose DBS settings which impair patients' communication. PMID:21977333

  6. Free Speech Advocates at Berkeley.

    ERIC Educational Resources Information Center

    Watts, William A.; Whittaker, David

    1966-01-01

    This study compares highly committed members of the Free Speech Movement (FSM) at Berkeley with the student population at large on 3 sociopsychological foci: general biographical data, religious orientation, and rigidity-flexibility. Questionnaires were administered to 172 FSM members selected by chance from the 10 to 1200 who entered and…

  7. Neuronal basis of speech comprehension.

    PubMed

    Specht, Karsten

    2014-01-01

    Verbal communication does not rely only on the simple perception of auditory signals. It is rather a parallel and integrative processing of linguistic and non-linguistic information, involving temporal and frontal areas in particular. This review describes the inherent complexity of auditory speech comprehension from a functional-neuroanatomical perspective. The review is divided into two parts. In the first part, structural and functional asymmetry of language relevant structures will be discus. The second part of the review will discuss recent neuroimaging studies, which coherently demonstrate that speech comprehension processes rely on a hierarchical network involving the temporal, parietal, and frontal lobes. Further, the results support the dual-stream model for speech comprehension, with a dorsal stream for auditory-motor integration, and a ventral stream for extracting meaning but also the processing of sentences and narratives. Specific patterns of functional asymmetry between the left and right hemisphere can also be demonstrated. The review article concludes with a discussion on interactions between the dorsal and ventral streams, particularly the involvement of motor related areas in speech perception processes, and outlines some remaining unresolved issues. This article is part of a Special Issue entitled Human Auditory Neuroimaging.

  8. Inner Speech Impairments in Autism

    ERIC Educational Resources Information Center

    Whitehouse, Andrew J. O.; Maybery, Murray T.; Durkin, Kevin

    2006-01-01

    Background: Three experiments investigated the role of inner speech deficit in cognitive performances of children with autism. Methods: Experiment 1 compared children with autism with ability-matched controls on a verbal recall task presenting pictures and words. Experiment 2 used pictures for which the typical names were either single syllable or…

  9. The Segmentation of Impromptu Speech.

    ERIC Educational Resources Information Center

    Svartvik, Jan

    A computer program for classifying elements of a language corpus for large-scale analysis is discussed. The approach is based on the assumption that there is a natural unit in speech processing and production, called a tone unit. The program "tags" the five grammatical phrase types (verb, adverb, adjective, noun, and prepositional) to…

  10. Affecting Critical Thinking through Speech.

    ERIC Educational Resources Information Center

    O'Keefe, Virginia P.

    Intended for teachers, this booklet shows how spoken language can affect student thinking and presents strategies for teaching critical thinking skills. The first section discusses the theoretical and research bases for promoting critical thinking through speech, defines critical thinking, explores critical thinking as abstract thinking, and tells…

  11. Speech Errors across the Lifespan

    ERIC Educational Resources Information Center

    Vousden, Janet I.; Maylor, Elizabeth A.

    2006-01-01

    Dell, Burger, and Svec (1997) proposed that the proportion of speech errors classified as anticipations (e.g., "moot and mouth") can be predicted solely from the overall error rate, such that the greater the error rate, the lower the anticipatory proportion (AP) of errors. We report a study examining whether this effect applies to changes in error…

  12. Prosodic Contrasts in Ironic Speech

    ERIC Educational Resources Information Center

    Bryant, Gregory A.

    2010-01-01

    Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

  13. Vector adaptive predictive coder for speech and audio

    NASA Technical Reports Server (NTRS)

    Chen, Juin-Hwey (Inventor); Gersho, Allen (Inventor)

    1990-01-01

    A real-time vector adaptive predictive coder which approximates each vector of K speech samples by using each of M fixed vectors in a first codebook to excite a time-varying synthesis filter and picking the vector that minimizes distortion. Predictive analysis for each frame determines parameters used for computing from vectors in the first codebook zero-state response vectors that are stored at the same address (index) in a second codebook. Encoding of input speech vectors s.sub.n is then carried out using the second codebook. When the vector that minimizes distortion is found, its index is transmitted to a decoder which has a codebook identical to the first codebook of the decoder. There the index is used to read out a vector that is used to synthesize an output speech vector s.sub.n. The parameters used in the encoder are quantized, for example by using a table, and the indices are transmitted to the decoder where they are decoded to specify transfer characteristics of filters used in producing the vector s.sub.n from the receiver codebook vector selected by the vector index transmitted.

  14. Rule-based frequency domain speech coding

    NASA Astrophysics Data System (ADS)

    McMillan, Vance M.

    1990-12-01

    A speech processing system is designed to simulate the transmission of speech signals using a speech coding scheme. The transmitter portion of the simulation extracts a minimized set of frequencies in Fourier space which represents the essence of each of the speech timeslices. These parameters are then adaptively quantized and transmitted to a receiver portion of the coding scheme. The receiver then generates an estimate of the original timeslice from the transmitted parameters using a sinusoidal speech model. After initial design, how each of the design parameters affect the human perceived quality of speech is studied. This is done with listening tests. The listening tests consist of having volunteers listen to a series of speech reconstructions. Each reconstruction is the result of the coding scheme acting on the same speech input file with the design parameters varied. The design parameters which are varied are: number of frequencies used in the sinusoidal speech model for reconstruction, number of bits to encode amplitude information, and number of bits used to code phase information. The final design parameters for the coding scheme were selected based on the results of the listening tests. Post design listening tests showed that the system was capable of 4800 bps speech transmission with a quality rating of five on a scale from zero (not understandable) to ten (sounds just like original speech).

  15. Speech recognition with amplitude and frequency modulations

    NASA Astrophysics Data System (ADS)

    Zeng, Fan-Gang; Nie, Kaibao; Stickney, Ginger S.; Kong, Ying-Yee; Vongphoe, Michael; Bhargave, Ashish; Wei, Chaogang; Cao, Keli

    2005-02-01

    Amplitude modulation (AM) and frequency modulation (FM) are commonly used in communication, but their relative contributions to speech recognition have not been fully explored. To bridge this gap, we derived slowly varying AM and FM from speech sounds and conducted listening tests using stimuli with different modulations in normal-hearing and cochlear-implant subjects. We found that although AM from a limited number of spectral bands may be sufficient for speech recognition in quiet, FM significantly enhances speech recognition in noise, as well as speaker and tone recognition. Additional speech reception threshold measures revealed that FM is particularly critical for speech recognition with a competing voice and is independent of spectral resolution and similarity. These results suggest that AM and FM provide independent yet complementary contributions to support robust speech recognition under realistic listening situations. Encoding FM may improve auditory scene analysis, cochlear-implant, and audiocoding performance. auditory analysis | cochlear implant | neural code | phase | scene analysis

  16. Speech motor learning in profoundly deaf adults.

    PubMed

    Nasir, Sazzad M; Ostry, David J

    2008-10-01

    Speech production, like other sensorimotor behaviors, relies on multiple sensory inputs--audition, proprioceptive inputs from muscle spindles and cutaneous inputs from mechanoreceptors in the skin and soft tissues of the vocal tract. However, the capacity for intelligible speech by deaf speakers suggests that somatosensory input alone may contribute to speech motor control and perhaps even to speech learning. We assessed speech motor learning in cochlear implant recipients who were tested with their implants turned off. A robotic device was used to alter somatosensory feedback by displacing the jaw during speech. We found that implant subjects progressively adapted to the mechanical perturbation with training. Moreover, the corrections that we observed were for movement deviations that were exceedingly small, on the order of millimeters, indicating that speakers have precise somatosensory expectations. Speech motor learning is substantially dependent on somatosensory input.

  17. Speech disorders in right-hemisphere stroke.

    PubMed

    Dyukova, G M; Glozman, Z M; Titova, E Y; Kriushev, E S; Gamaleya, A A

    2010-07-01

    Clinical practice shows that right-hemisphere cerebral strokes are often accompanied by one speech disorder or another. The aim of the present work was to analyze published data addressing speech disorders in right-sided strokes. Questions of the lateralization of speech functions are discussed, with particular reference to the role of the right hemisphere in speech activity and the structure of speech pathology in right-hemisphere foci. Clinical variants of speech disorders, such as aphasia, dysprosody, dysarthria, mutism, and stutter are discussed in detail. Types of speech disorders are also discussed, along with the possible mechanisms of their formation depending on the locations of lesions in the axis of the brain (cortex, subcortical structures, stem, cerebellum) and focus size.

  18. Perception of speech sounds in school-age children with speech sound disorders

    PubMed Central

    Preston, Jonathan L.; Irwin, Julia R.; Turcios, Jacqueline

    2015-01-01

    Children with speech sound disorders may perceive speech differently than children with typical speech development. The nature of these speech differences is reviewed with an emphasis on assessing phoneme-specific perception for speech sounds that are produced in error. Category goodness judgment, or the ability to judge accurate and inaccurate tokens of speech sounds, plays an important role in phonological development. The software Speech Assessment and Interactive Learning System (Rvachew, 1994), which has been effectively used to assess preschoolers’ ability to perform goodness judgments, is explored for school-age children with residual speech errors (RSE). However, data suggest that this particular task may not be sensitive to perceptual differences in school-age children. The need for the development of clinical tools for assessment of speech perception in school-age children with RSE is highlighted, and clinical suggestions are provided. PMID:26458198

  19. Perception of Speech Sounds in School-Aged Children with Speech Sound Disorders.

    PubMed

    Preston, Jonathan L; Irwin, Julia R; Turcios, Jacqueline

    2015-11-01

    Children with speech sound disorders may perceive speech differently than children with typical speech development. The nature of these speech differences is reviewed with an emphasis on assessing phoneme-specific perception for speech sounds that are produced in error. Category goodness judgment, or the ability to judge accurate and inaccurate tokens of speech sounds, plays an important role in phonological development. The software Speech Assessment and Interactive Learning System, which has been effectively used to assess preschoolers' ability to perform goodness judgments, is explored for school-aged children with residual speech errors (RSEs). However, data suggest that this particular task may not be sensitive to perceptual differences in school-aged children. The need for the development of clinical tools for assessment of speech perception in school-aged children with RSE is highlighted, and clinical suggestions are provided.

  20. Intelligibility of laryngectomees' substitute speech: automatic speech recognition and subjective rating.

    PubMed

    Schuster, Maria; Haderlein, Tino; Nöth, Elmar; Lohscheller, Jörg; Eysholdt, Ulrich; Rosanowski, Frank

    2006-02-01

    Substitute speech after laryngectomy is characterized by restricted aero-acoustic properties in comparison with laryngeal speech and has therefore lower intelligibility. Until now, an objective means to determine and quantify the intelligibility has not existed, although the intelligibility can serve as a global outcome parameter of voice restoration after laryngectomy. An automatic speech recognition system was applied on recordings of a standard text read by 18 German male laryngectomees with tracheoesophageal substitute speech. The system was trained with normal laryngeal speakers and not adapted to severely disturbed voices. Substitute speech was compared to laryngeal speech of a control group. Subjective evaluation of intelligibility was performed by a panel of five experts and compared to automatic speech evaluation. Substitute speech showed lower syllables/s and lower word accuracy than laryngeal speech. Automatic speech recognition for substitute speech yielded word accuracy between 10.0 and 50% (28.7+/-12.1%) with sufficient discrimination. It complied with experts' subjective evaluations of intelligibility. The multi-rater kappa of the experts alone did not differ from the multi-rater kappa of experts and the recognizer. Automatic speech recognition serves as a good means to objectify and quantify global speech outcome of laryngectomees. For clinical use, the speech recognition system will be adapted to disturbed voices and can also be applied in other languages.

  1. Speech Entrainment Compensates for Broca's Area Damage

    PubMed Central

    Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris

    2015-01-01

    Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443

  2. Smartphone-based real-time speech enhancement for improving hearing aids speech perception.

    PubMed

    Yu Rao; Yiya Hao; Panahi, Issa M S; Kehtarnavaz, Nasser

    2016-08-01

    In this paper, the development of a speech processing pipeline on smartphones for hearing aid devices (HADs) is presented. This pipeline is used for noise suppression and speech enhancement (SE) to improve speech quality and intelligibility. The proposed method is implemented to run in real-time on Android smartphones. The results of the testing conducted indicate that the proposed method suppresses the noise and improves the perceptual quality of speech in terms of three objective measures of perceptual evaluation of speech quality (PESQ), noise attenuation level (NAL), and the coherent speech intelligibility index (CSD).

  3. Levels of Processing of Speech and Non-Speech

    DTIC Science & Technology

    1991-05-10

    Timbre : A better musical analogv to speech? Presented to the Acoustical Society of America. Anaheim. A. Samuel. (Fall 1987) Central and peripheal...Thle studies of listener based factors include studies of perceptual. restoration of deleted sounds (phonemes or musical notes), and studies of the... music . The attentional investi- ctnsdemons;trate, rjAher fine-tuned ittentional control under high-predictability condi- Lios. ic~ifcart oogrssh&A; been

  4. Segregation of Whispered Speech Interleaved with Noise or Speech Maskers

    DTIC Science & Technology

    2011-08-01

    presented diotically via Beyerdyanamic DT990 Pro headphone . Listeners were seated in front of a computer monitor in a sound-treated room and responded to...a target speech signal from a same talker masker [13]. Performance was best when the target speaker was different from the masker and decreased as...Tartter, V. C. 1991. “Identifiability of vowels and speakers from whispered syllables,” Percept. Psychophys. 49, 365–372. [4] Tartter, V. C. 1989

  5. Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition.

    PubMed

    Schädler, Marc René; Meyer, Bernd T; Kollmeier, Birger

    2012-05-01

    In an attempt to increase the robustness of automatic speech recognition (ASR) systems, a feature extraction scheme is proposed that takes spectro-temporal modulation frequencies (MF) into account. This physiologically inspired approach uses a two-dimensional filter bank based on Gabor filters, which limits the redundant information between feature components, and also results in physically interpretable features. Robustness against extrinsic variation (different types of additive noise) and intrinsic variability (arising from changes in speaking rate, effort, and style) is quantified in a series of recognition experiments. The results are compared to reference ASR systems using Mel-frequency cepstral coefficients (MFCCs), MFCCs with cepstral mean subtraction (CMS) and RASTA-PLP features, respectively. Gabor features are shown to be more robust against extrinsic variation than the baseline systems without CMS, with relative improvements of 28% and 16% for two training conditions (using only clean training samples or a mixture of noisy and clean utterances, respectively). When used in a state-of-the-art system, improvements of 14% are observed when spectro-temporal features are concatenated with MFCCs, indicating the complementarity of those feature types. An analysis of the importance of specific MF shows that temporal MF up to 25 Hz and spectral MF up to 0.25 cycles/channel are beneficial for ASR.

  6. Analysis and synthesis of laughter

    NASA Astrophysics Data System (ADS)

    Sundaram, Shiva; Narayanan, Shrikanth

    2004-10-01

    There is much enthusiasm in the text-to-speech community for synthesis of emotional and natural speech. One idea being proposed is to include emotion dependent paralinguistic cues during synthesis to convey emotions effectively. This requires modeling and synthesis techniques of various cues for different emotions. Motivated by this, a technique to synthesize human laughter is proposed. Laughter is a complex mechanism of expression and has high variability in terms of types and usage in human-human communication. People have their own characteristic way of laughing. Laughter can be seen as a controlled/uncontrolled physiological process of a person resulting from an initial excitation in context. A parametric model based on damped simple harmonic motion to effectively capture these diversities and also maintain the individuals characteristics is developed here. Limited laughter/speech data from actual humans and synthesis ease are the constraints imposed on the accuracy of the model. Analysis techniques are also developed to determine the parameters of the model for a given individual or laughter type. Finally, the effectiveness of the model to capture the individual characteristics and naturalness compared to real human laughter has been analyzed. Through this the factors involved in individual human laughter and their importance can be better understood.

  7. Individual differneces in degraded speech perception

    NASA Astrophysics Data System (ADS)

    Carbonell, Kathy M.

    One of the lasting concerns in audiology is the unexplained individual differences in speech perception performance even for individuals with similar audiograms. One proposal is that there are cognitive/perceptual individual differences underlying this vulnerability and that these differences are present in normal hearing (NH) individuals but do not reveal themselves in studies that use clear speech produced in quiet (because of a ceiling effect). However, previous studies have failed to uncover cognitive/perceptual variables that explain much of the variance in NH performance on more challenging degraded speech tasks. This lack of strong correlations may be due to either examining the wrong measures (e.g., working memory capacity) or to there being no reliable differences in degraded speech performance in NH listeners (i.e., variability in performance is due to measurement noise). The proposed project has 3 aims; the first, is to establish whether there are reliable individual differences in degraded speech performance for NH listeners that are sustained both across degradation types (speech in noise, compressed speech, noise-vocoded speech) and across multiple testing sessions. The second aim is to establish whether there are reliable differences in NH listeners' ability to adapt their phonetic categories based on short-term statistics both across tasks and across sessions; and finally, to determine whether performance on degraded speech perception tasks are correlated with performance on phonetic adaptability tasks, thus establishing a possible explanatory variable for individual differences in speech perception for NH and hearing impaired listeners.

  8. Sensorimotor influences on speech perception in infancy.

    PubMed

    Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F

    2015-11-03

    The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development.

  9. Loss tolerant speech decoder for telecommunications

    NASA Technical Reports Server (NTRS)

    Prieto, Jr., Jaime L. (Inventor)

    1999-01-01

    A method and device for extrapolating past signal-history data for insertion into missing data segments in order to conceal digital speech frame errors. The extrapolation method uses past-signal history that is stored in a buffer. The method is implemented with a device that utilizes a finite-impulse response (FIR) multi-layer feed-forward artificial neural network that is trained by back-propagation for one-step extrapolation of speech compression algorithm (SCA) parameters. Once a speech connection has been established, the speech compression algorithm device begins sending encoded speech frames. As the speech frames are received, they are decoded and converted back into speech signal voltages. During the normal decoding process, pre-processing of the required SCA parameters will occur and the results stored in the past-history buffer. If a speech frame is detected to be lost or in error, then extrapolation modules are executed and replacement SCA parameters are generated and sent as the parameters required by the SCA. In this way, the information transfer to the SCA is transparent, and the SCA processing continues as usual. The listener will not normally notice that a speech frame has been lost because of the smooth transition between the last-received, lost, and next-received speech frames.

  10. Some articulatory details of emotional speech

    NASA Astrophysics Data System (ADS)

    Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth

    2005-09-01

    Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.

  11. Speech Articulator and User Gesture Measurements Using Micropower, Interferometric EM-Sensors

    SciTech Connect

    Holzrichter, J F; Ng, L C

    2001-02-06

    Very low power, GHz frequency, ''radar-like'' sensors can measure a variety of motions produced by a human user of machine interface devices. These data can be obtained ''at a distance'' and can measure ''hidden'' structures. Measurements range from acoustic induced, 10-micron amplitude vibrations of vocal tract tissues, to few centimeter human speech articulator motions, to meter-class motions of the head, hands, or entire body. These EM sensors measure ''fringe motions'' as reflected EM waves are mixed with a local (homodyne) reference wave. These data, when processed using models of the system being measured, provide real time states of interface positions or other targets vs. time. An example is speech articulator positions vs. time in the user's body. This information appears to be useful for a surprisingly wide range of applications ranging from speech coding synthesis and recognition, speaker or object identification, noise cancellation, hand or head motions for cursor direction, and other applications.

  12. A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

    PubMed

    Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

    2015-01-01

    The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.

  13. Segmenting Words from Natural Speech: Subsegmental Variation in Segmental Cues

    ERIC Educational Resources Information Center

    Rytting, C. Anton; Brew, Chris; Fosler-Lussier, Eric

    2010-01-01

    Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We…

  14. Extensions to the Speech Disorders Classification System (SDCS)

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    This report describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). Part I describes a classification extension to the SDCS to differentiate motor speech disorders from speech delay and to differentiate among three sub-types of motor speech disorders.…

  15. Modeling Interactions between Speech Production and Perception: Speech Error Detection at Semantic and Phonological Levels and the Inner Speech Loop

    PubMed Central

    Kröger, Bernd J.; Crawford, Eric; Bekolay, Trevor; Eliasmith, Chris

    2016-01-01

    Production and comprehension of speech are closely interwoven. For example, the ability to detect an error in one's own speech, halt speech production, and finally correct the error can be explained by assuming an inner speech loop which continuously compares the word representations induced by production to those induced by perception at various cognitive levels (e.g., conceptual, word, or phonological levels). Because spontaneous speech errors are relatively rare, a picture naming and halt paradigm can be used to evoke them. In this paradigm, picture presentation (target word initiation) is followed by an auditory stop signal (distractor word) for halting speech production. The current study seeks to understand the neural mechanisms governing self-detection of speech errors by developing a biologically inspired neural model of the inner speech loop. The neural model is based on the Neural Engineering Framework (NEF) and consists of a network of about 500,000 spiking neurons. In the first experiment we induce simulated speech errors semantically and phonologically. In the second experiment, we simulate a picture naming and halt task. Target-distractor word pairs were balanced with respect to variation of phonological and semantic similarity. The results of the first experiment show that speech errors are successfully detected by a monitoring component in the inner speech loop. The results of the second experiment show that the model correctly reproduces human behavioral data on the picture naming and halt task. In particular, the halting rate in the production of target words was lower for phonologically similar words than for semantically similar or fully dissimilar distractor words. We thus conclude that the neural architecture proposed here to model the inner speech loop reflects important interactions in production and perception at phonological and semantic levels. PMID:27303287

  16. Headphone localization of speech stimuli

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1991-01-01

    Recently, three dimensional acoustic display systems have been developed that synthesize virtual sound sources over headphones based on filtering by Head-Related Transfer Functions (HRTFs), the direction-dependent spectral changes caused primarily by the outer ears. Here, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with non-individualized HRTFs. About half of the subjects 'pulled' their judgements toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgements; 15 to 46 percent of stimuli were heard inside the head with the shortest estimates near the median plane. The results infer that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized RTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  17. Language processing for speech understanding

    NASA Astrophysics Data System (ADS)

    Woods, W. A.

    1983-07-01

    This report considers language understanding techniques and control strategies that can be applied to provide higher-level support to aid in the understanding of spoken utterances. The discussion is illustrated with concepts and examples from the BBN speech understanding system, HWIM (Hear What I Mean). The HWIM system was conceived as an assistant to a travel budget manager, a system that would store information about planned and taken trips, travel budgets and their planning. The system was able to respond to commands and answer questions spoken into a microphone, and was able to synthesize spoken responses as output. HWIM was a prototype system used to drive speech understanding research. It used a phonetic-based approach, with no speaker training, a large vocabulary, and a relatively unconstraining English grammar. Discussed here is the control structure of the HWIM and the parsing algorithm used to parse sentences from the middle-out, using an ATN grammar.

  18. Prediction and imitation in speech.

    PubMed

    Gambi, Chiara; Pickering, Martin J

    2013-01-01

    It has been suggested that intra- and inter-speaker variability in speech are correlated. Interlocutors have been shown to converge on various phonetic dimensions. In addition, speakers imitate the phonetic properties of voices they are exposed to in shadowing, repetition, and even passive listening tasks. We review three theoretical accounts of speech imitation and convergence phenomena: (i) the Episodic Theory (ET) of speech perception and production (Goldinger, 1998); (ii) the Motor Theory (MT) of speech perception (Liberman and Whalen, 2000; Galantucci et al., 2006); (iii) Communication Accommodation Theory (CAT; Giles and Coupland, 1991; Giles et al., 1991). We argue that no account is able to explain all the available evidence. In particular, there is a need to integrate low-level, mechanistic accounts (like ET and MT), and higher-level accounts (like CAT). We propose that this is possible within the framework of an integrated theory of production and comprehension (Pickering and Garrod, 2013). Similarly to both ET and MT, this theory assumes parity between production and perception. Uniquely, however, it posits that listeners simulate speakers' utterances by computing forward-model predictions at many different levels, which are then compared to the incoming phonetic input. In our account phonetic imitation can be achieved via the same mechanism that is responsible for sensorimotor adaptation; i.e., the correction of prediction errors. In addition, the model assumes that the degree to which sensory prediction errors lead to motor adjustments is context-dependent. The notion of context subsumes both the preceding linguistic input and non-linguistic attributes of the situation (e.g., the speaker's and listener's social identities, their conversational roles, the listener's intention to imitate).

  19. Status Report on Speech Research

    DTIC Science & Technology

    1992-06-01

    specialization for language, adopting what Fodor specialization for language at the precognitive (1983) would characterize as a vertical view in level? Is there...incorporate a precognitive specializa- production and perception of speech; indeed, it assumes an organic relation between the two. It happens, however...netic primitives. Perception of phonetic structure 1977; Crowder & Morton, 1969; Fujisaki & is therefore precognitive , which is to say immedi

  20. Network Speech Systems Technology Program.

    DTIC Science & Technology

    1979-09-30

    utilized for the digital speech. As discussed in Ref . 8, the required control overhead for packet transmission can be reduced to a level comparable...multiplexer with fixed-channel capacity are discussed in Ref . 17. For the multi-node system considered here, the effects of both fixed- and variable...Phase IV and described in Ref . 2. We then ran a comparison among centrally controlled simplex broadcast and speaker/interrupter systems together with the

  1. Prediction and imitation in speech

    PubMed Central

    Gambi, Chiara; Pickering, Martin J.

    2013-01-01

    It has been suggested that intra- and inter-speaker variability in speech are correlated. Interlocutors have been shown to converge on various phonetic dimensions. In addition, speakers imitate the phonetic properties of voices they are exposed to in shadowing, repetition, and even passive listening tasks. We review three theoretical accounts of speech imitation and convergence phenomena: (i) the Episodic Theory (ET) of speech perception and production (Goldinger, 1998); (ii) the Motor Theory (MT) of speech perception (Liberman and Whalen, 2000; Galantucci et al., 2006); (iii) Communication Accommodation Theory (CAT; Giles and Coupland, 1991; Giles et al., 1991). We argue that no account is able to explain all the available evidence. In particular, there is a need to integrate low-level, mechanistic accounts (like ET and MT), and higher-level accounts (like CAT). We propose that this is possible within the framework of an integrated theory of production and comprehension (Pickering and Garrod, 2013). Similarly to both ET and MT, this theory assumes parity between production and perception. Uniquely, however, it posits that listeners simulate speakers' utterances by computing forward-model predictions at many different levels, which are then compared to the incoming phonetic input. In our account phonetic imitation can be achieved via the same mechanism that is responsible for sensorimotor adaptation; i.e., the correction of prediction errors. In addition, the model assumes that the degree to which sensory prediction errors lead to motor adjustments is context-dependent. The notion of context subsumes both the preceding linguistic input and non-linguistic attributes of the situation (e.g., the speaker's and listener's social identities, their conversational roles, the listener's intention to imitate). PMID:23801971

  2. Speech Recognition of Foreign Accent

    DTIC Science & Technology

    1994-06-01

    ENGLISH PHONETIC ALPHABETS.. 10 TABLE 3. AVERAGE MALE FORMANT FREQUENCIES .................. 11 TABLE 4. FOURTEEN-WORD LIST WITH VOWEL FORMANTS AND PH O N...note (as in the constants sounds in ",ai"). The classes of vowels get their names from how they are articulated, or how the tongue is used to produce a...called formant frequencies, or formants . These frequencies would be considered normal speech, or in this case theoretical or ideal frequencies. The

  3. Effects of gaze and speech rate on receivers' evaluations of persuasive speech.

    PubMed

    Yokoyama, Hitomi; Daibo, Ikuo

    2012-04-01

    This study examined how gaze and speech rate affect perceptions of a speaker. Participants viewed a video recording of one of four persuasive messages delivered by a female speaker. Analysis of speech rate, gaze, and listener's sex revealed that when combined with a small amount of gaze, slow speech rate decreased trustworthiness as compared to a fast speech rate. For women, slow speech rate was thought to be indicative of less expertise as compared to a fast speech rate, again when combined with low gaze. There were no significant interactions, but there were main effects of gaze and speech rate on persuasiveness. High levels of gaze and slow speech rate each enhanced perceptions of the speaker's persuasiveness.

  4. Self-Evaluation and Pre-Speech Planning: A Strategy for Sharing Responsibility for Progress in the Speech Class.

    ERIC Educational Resources Information Center

    Desjardins, Linda A.

    Speech class teachers can implement a pre- and post-speech strategy, using pre-speech and self-evaluation forms, to help students become active in directing their own progress, and acknowledge their own accomplishments. Every speech is tape-recorded in class. Students listen to their speeches later and fill in the self-evaluation form, which asks…

  5. The Neural Bases of Difficult Speech Comprehension and Speech Production: Two Activation Likelihood Estimation (ALE) Meta-Analyses

    ERIC Educational Resources Information Center

    Adank, Patti

    2012-01-01

    The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…

  6. Networked Humanoid Animation Driven by Human Voice Using Extensible 3D (X3D), H-Anim and Java Speech Open Standards

    DTIC Science & Technology

    2002-03-01

    sound of a person’s voice for each individual, molding the buzzing sound from the larynx into a sound with character. Lastly, the muscles in the tongue ...speech- recognition system stores a list of what phonemes sound like. That is, a table of relative formant positions is kept by the software, and the...speech, or formant synthesis using signal processing techniques based on knowledge of how phonemes sound and how prosody affects those phonemes. a

  7. Giving speech a hand: gesture modulates activity in auditory cortex during speech perception.

    PubMed

    Hubbard, Amy L; Wilson, Stephen M; Callan, Daniel E; Dapretto, Mirella

    2009-03-01

    Viewing hand gestures during face-to-face communication affects speech perception and comprehension. Despite the visible role played by gesture in social interactions, relatively little is known about how the brain integrates hand gestures with co-occurring speech. Here we used functional magnetic resonance imaging (fMRI) and an ecologically valid paradigm to investigate how beat gesture-a fundamental type of hand gesture that marks speech prosody-might impact speech perception at the neural level. Subjects underwent fMRI while listening to spontaneously-produced speech accompanied by beat gesture, nonsense hand movement, or a still body; as additional control conditions, subjects also viewed beat gesture, nonsense hand movement, or a still body all presented without speech. Validating behavioral evidence that gesture affects speech perception, bilateral nonprimary auditory cortex showed greater activity when speech was accompanied by beat gesture than when speech was presented alone. Further, the left superior temporal gyrus/sulcus showed stronger activity when speech was accompanied by beat gesture than when speech was accompanied by nonsense hand movement. Finally, the right planum temporale was identified as a putative multisensory integration site for beat gesture and speech (i.e., here activity in response to speech accompanied by beat gesture was greater than the summed responses to speech alone and beat gesture alone), indicating that this area may be pivotally involved in synthesizing the rhythmic aspects of both speech and gesture. Taken together, these findings suggest a common neural substrate for processing speech and gesture, likely reflecting their joint communicative role in social interactions.

  8. Interactive Activation Model of Speech Perception.

    DTIC Science & Technology

    1984-11-01

    contract. 0 Elar, .l... & .McC’lelland .1.1. Speech perception a, a cognitive proces,: The interactive act ia- %e., tion model of speech perception. In...attempts to provide a machine solution to the problem of speech perception. A second kind of model, growing out of Cognitive Psychology, attempts to...architectures to cognitive and perceptual problems. We also owe a debt to what we might call the computational connectionists -- those who have applied highly

  9. Integrated speech enhancement for functional MRI environment.

    PubMed

    Pathak, Nishank; Milani, Ali A; Panahi, Issa; Briggs, Richard

    2009-01-01

    This paper presents an integrated speech enhancement (SE) method for the noisy MRI environment. We show that the performance of SE system improves considerably when the speech signal dominated by MRI acoustic noise at very low SNR is enhanced in two successive stages using two-channel SE methods followed by a single-channel post processing SE algorithm. Actual MRI noisy speech data are used in our experiments showing the improved performance of the proposed SE method.

  10. Modelling speech intelligibility in adverse conditions.

    PubMed

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    Jørgensen and Dau (J Acoust Soc Am 130:1475-1487, 2011) proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting the intelligibility of reverberant speech as well as noisy speech processed by spectral subtraction. The key role of the SNRenv metric is further supported here by the ability of a short-term version of the sEPSM to predict speech masking release for different speech materials and modulated interferers. However, the sEPSM cannot account for speech subjected to phase jitter, a condition in which the spectral structure of the intelligibility of speech signal is strongly affected, while the broadband temporal envelope is kept largely intact. In contrast, the effects of this distortion can be predicted -successfully by the spectro-temporal modulation index (STMI) (Elhilali et al., Speech Commun 41:331-348, 2003), which assumes an explicit analysis of the spectral "ripple" structure of the speech signal. However, since the STMI applies the same decision metric as the STI, it fails to account for spectral subtraction. The results from this study suggest that the SNRenv might reflect a powerful decision metric, while some explicit across-frequency analysis seems crucial in some conditions. How such across-frequency analysis is "realized" in the auditory system remains unresolved.

  11. Speech enhancement using local spectral regularization

    NASA Astrophysics Data System (ADS)

    Sandoval-Ibarra, Yuma; Diaz-Ramirez, Victor H.; Kober, Vitaly; Diaz, Arnoldo

    2016-09-01

    A locally-adaptive algorithm for speech enhancement based on local spectral regularization is presented. The algorithm is able to retrieve a clean speech signal from a noisy signal using locally-adaptive signal processing. The proposed algorithm is able to increase the quality of a noisy signal in terms of objective metrics. Computer simulation results obtained with the proposed algorithm are presented and discussed in processing speech signals corrupted with additive noise.

  12. Computer Recognition of Phonets in Speech.

    DTIC Science & Technology

    1982-12-01

    Felkey. (2) We could apply the already available Eclipse AP /130 Array Processor to the task of transforming the speech time series to the spectral domain...file of observations generated from speech. (3) To apply the Eclipse AP /130 Array Processor to tasks one and two for speed. I (4) To build a sufficient...One objective of this project was to implement a machine like Seelandt’s (Ref 1) Speech Sound Analysis Machine (SSAM) on the Eclipse AP /130 Array

  13. Continuous speech recognition for clinicians.

    PubMed

    Zafar, A; Overhage, J M; McDonald, C J

    1999-01-01

    The current generation of continuous speech recognition systems claims to offer high accuracy (greater than 95 percent) speech recognition at natural speech rates (150 words per minute) on low-cost (under $2000) platforms. This paper presents a state-of-the-technology summary, along with insights the authors have gained through testing one such product extensively and other products superficially. The authors have identified a number of issues that are important in managing accuracy and usability. First, for efficient recognition users must start with a dictionary containing the phonetic spellings of all words they anticipate using. The authors dictated 50 discharge summaries using one inexpensive internal medicine dictionary ($30) and found that they needed to add an additional 400 terms to get recognition rates of 98 percent. However, if they used either of two more expensive and extensive commercial medical vocabularies ($349 and $695), they did not need to add terms to get a 98 percent recognition rate. Second, users must speak clearly and continuously, distinctly pronouncing all syllables. Users must also correct errors as they occur, because accuracy improves with error correction by at least 5 percent over two weeks. Users may find it difficult to train the system to recognize certain terms, regardless of the amount of training, and appropriate substitutions must be created. For example, the authors had to substitute "twice a day" for "bid" when using the less expensive dictionary, but not when using the other two dictionaries. From trials they conducted in settings ranging from an emergency room to hospital wards and clinicians' offices, they learned that ambient noise has minimal effect. Finally, they found that a minimal "usable" hardware configuration (which keeps up with dictation) comprises a 300-MHz Pentium processor with 128 MB of RAM and a "speech quality" sound card (e.g., SoundBlaster, $99). Anything less powerful will result in the system lagging

  14. Continuous Speech Recognition for Clinicians

    PubMed Central

    Zafar, Atif; Overhage, J. Marc; McDonald, Clement J.

    1999-01-01

    The current generation of continuous speech recognition systems claims to offer high accuracy (greater than 95 percent) speech recognition at natural speech rates (150 words per minute) on low-cost (under $2000) platforms. This paper presents a state-of-the-technology summary, along with insights the authors have gained through testing one such product extensively and other products superficially. The authors have identified a number of issues that are important in managing accuracy and usability. First, for efficient recognition users must start with a dictionary containing the phonetic spellings of all words they anticipate using. The authors dictated 50 discharge summaries using one inexpensive internal medicine dictionary ($30) and found that they needed to add an additional 400 terms to get recognition rates of 98 percent. However, if they used either of two more expensive and extensive commercial medical vocabularies ($349 and $695), they did not need to add terms to get a 98 percent recognition rate. Second, users must speak clearly and continuously, distinctly pronouncing all syllables. Users must also correct errors as they occur, because accuracy improves with error correction by at least 5 percent over two weeks. Users may find it difficult to train the system to recognize certain terms, regardless of the amount of training, and appropriate substitutions must be created. For example, the authors had to substitute “twice a day” for “bid” when using the less expensive dictionary, but not when using the other two dictionaries. From trials they conducted in settings ranging from an emergency room to hospital wards and clinicians' offices, they learned that ambient noise has minimal effect. Finally, they found that a minimal “usable” hardware configuration (which keeps up with dictation) comprises a 300-MHz Pentium processor with 128 MB of RAM and a “speech quality” sound card (e.g., SoundBlaster, $99). Anything less powerful will result in

  15. Spotlight on Speech Codes 2012: The State of Free Speech on Our Nation's Campuses

    ERIC Educational Resources Information Center

    Foundation for Individual Rights in Education (NJ1), 2012

    2012-01-01

    The U.S. Supreme Court has called America's colleges and universities "vital centers for the Nation's intellectual life," but the reality today is that many of these institutions severely restrict free speech and open debate. Speech codes--policies prohibiting student and faculty speech that would, outside the bounds of campus, be…

  16. Speech rate effects on the processing of conversational speech across the adult life span.

    PubMed

    Koch, Xaver; Janse, Esther

    2016-04-01

    This study investigates the effect of speech rate on spoken word recognition across the adult life span. Contrary to previous studies, conversational materials with a natural variation in speech rate were used rather than lab-recorded stimuli that are subsequently artificially time-compressed. It was investigated whether older adults' speech recognition is more adversely affected by increased speech rate compared to younger and middle-aged adults, and which individual listener characteristics (e.g., hearing, fluid cognitive processing ability) predict the size of the speech rate effect on recognition performance. In an eye-tracking experiment, participants indicated with a mouse-click which visually presented words they recognized in a conversational fragment. Click response times, gaze, and pupil size data were analyzed. As expected, click response times and gaze behavior were affected by speech rate, indicating that word recognition is more difficult if speech rate is faster. Contrary to earlier findings, increased speech rate affected the age groups to the same extent. Fluid cognitive processing ability predicted general recognition performance, but did not modulate the speech rate effect. These findings emphasize that earlier results of age by speech rate interactions mainly obtained with artificially speeded materials may not generalize to speech rate variation as encountered in conversational speech.

  17. Speech and Language Skills of Parents of Children with Speech Sound Disorders

    ERIC Educational Resources Information Center

    Lewis, Barbara A.; Freebairn, Lisa A.; Hansen, Amy J.; Miscimarra, Lara; Iyengar, Sudha K.; Taylor, H. Gerry

    2007-01-01

    Purpose: This study compared parents with histories of speech sound disorders (SSD) to parents without known histories on measures of speech sound production, phonological processing, language, reading, and spelling. Familial aggregation for speech and language disorders was also examined. Method: The participants were 147 parents of children with…

  18. The Practical Philosophy of Communication Ethics and Free Speech as the Foundation for Speech Communication.

    ERIC Educational Resources Information Center

    Arnett, Ronald C.

    1990-01-01

    Argues that communication ethics and free speech are the foundation for understanding the field of speech communication and its proper positioning in the larger array of academic disciplines. Argues that speech communication as a discipline can be traced back to a "practical philosophical" foundation detailed by Aristotle. (KEH)

  19. A MANUAL ON SPEECH THERAPY FOR PARENTS' USE WITH CHILDREN WHO HAVE MINOR SPEECH PROBLEMS.

    ERIC Educational Resources Information Center

    OGG, HELEN LOREE

    A MANUAL, TO PROVIDE PARENTS WITH AN UNDERSTANDING OF THE WORK OF THE SPEECH TEACHER AND WITH METHODS TO CORRECT THE POOR SPEECH HABITS OF THEIR CHILDREN IS PRESENTED. AREAS INCLUDE THE ORGANS OF SPEECH, WHERE THEY SHOULD BE PLACED TO MAKE EACH SOUND, AND HOW THEY SHOULD OR SHOULD NOT MOVE. EASY DIRECTIONS ARE GIVEN FOR PRODUCING THE MOST…

  20. Spotlight on Speech Codes 2007: The State of Free Speech on Our Nation's Campuses

    ERIC Educational Resources Information Center

    Foundation for Individual Rights in Education (NJ1), 2007

    2007-01-01

    Last year, the Foundation for Individual Rights in Education (FIRE) conducted its first-ever comprehensive study of restrictions on speech at America's colleges and universities, "Spotlight on Speech Codes 2006: The State of Free Speech on our Nation's Campuses." In light of the essentiality of free expression to a truly liberal…

  1. Construction of a Rated Speech Corpus of L2 Learners' Spontaneous Speech

    ERIC Educational Resources Information Center

    Yoon, Su-Youn; Pierce, Lisa; Huensch, Amanda; Juul, Eric; Perkins, Samantha; Sproat, Richard; Hasegawa-Johnson, Mark

    2009-01-01

    This work reports on the construction of a rated database of spontaneous speech produced by second language (L2) learners of English. Spontaneous speech was collected from 28 L2 speakers representing six language backgrounds and five different proficiency levels. Speech was elicited using formats similar to that of the TOEFL iBT and the Speaking…

  2. Exploring the Role of Brain Oscillations in Speech Perception in Noise: Intelligibility of Isochronously Retimed Speech

    PubMed Central

    Aubanel, Vincent; Davis, Chris; Kim, Jeesun

    2016-01-01

    A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximize processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioral experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets) was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise. PMID:27630552

  3. Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis

    ERIC Educational Resources Information Center

    Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.

    2009-01-01

    Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…

  4. Speech enhancement using a structured codebook.

    PubMed

    Naidu, D Hanumantha Rao; Srinivasan, Sriram; Rao, G V Prabhakara

    2012-10-01

    Codebook-based speech enhancement methods that use trained codebooks of speech and noise spectra provide good performance even under non-stationary noise conditions. A drawback, however, is their high computational cost. For every pair of speech and noise codebook vectors, a likelihood score indicating how well that pair matches the observation is computed. In this paper, a method that identifies and performs only relevant likelihood computations by imposing a hierarchical structure on the speech codebook is proposed. The performance of the proposed method is shown to be close to that of the original scheme but at a significantly lower computational cost.

  5. Speech disorders of Parkinsonism: a review.

    PubMed Central

    Critchley, E M

    1981-01-01

    Study of the speech disorders of Parkinsonism provides a paradigm of the integration of phonation, articulation and language in the production of speech. The initial defect in the untreated patient is a failure to control respiration for the purpose of speech and there follows a forward progression of articulatory symptoms involving larynx, pharynx, tongue and finally lips. There is evidence that the integration of speech production is organised asymmetrically at thalamic level. Experimental or therapeutic lesions in the region of the inferior medial portion of ventro-lateral thalamus may influence the initiation, respiratory control, rate and prosody of speech. Higher language functions may also be involved in thalamic integration: different forms of anomia are reported with pulvinar and ventrolateral thalamic lesions and transient aphasia may follow stereotaxis. The results of treatment with levodopa indicates that neurotransmitter substances enhance the clarity, volume and persistence of phonation and the latency and smoothness of articulation. The improvement of speech performance is not necessarily in phase with locomotor changes. The dose-related dyskinetic effects of levodopa, which appear to have a physiological basis in observations previously made in post-encephalitic Parkinsonism, not only influence the prosody of speech with near-mutism, hesitancy and dysfluency but may affect work-finding ability and in instances of excitement (erethism) even involve the association of long-term memory with speech. In future, neurologists will need to examine more closely the role of neurotransmitters in speech production and formulation. PMID:7031185

  6. Normal and Time-Compressed Speech

    PubMed Central

    Lemke, Ulrike; Kollmeier, Birger; Holube, Inga

    2016-01-01

    Short-term and long-term learning effects were investigated for the German Oldenburg sentence test (OLSA) using original and time-compressed fast speech in noise. Normal-hearing and hearing-impaired participants completed six lists of the OLSA in five sessions. Two groups of normal-hearing listeners (24 and 12 listeners) and two groups of hearing-impaired listeners (9 listeners each) performed the test with original or time-compressed speech. In general, original speech resulted in better speech recognition thresholds than time-compressed speech. Thresholds decreased with repetition for both speech materials. Confirming earlier results, the largest improvements were observed within the first measurements of the first session, indicating a rapid initial adaptation phase. The improvements were larger for time-compressed than for original speech. The novel results on long-term learning effects when using the OLSA indicate a longer phase of ongoing learning, especially for time-compressed speech, which seems to be limited by a floor effect. In addition, for normal-hearing participants, no complete transfer of learning benefits from time-compressed to original speech was observed. These effects should be borne in mind when inviting listeners repeatedly, for example, in research settings.

  7. Speech evaluation for patients with cleft palate.

    PubMed

    Kummer, Ann W

    2014-04-01

    Children with cleft palate are at risk for speech problems, particularly those caused by velopharyngeal insufficiency. There may be an additional risk of speech problems caused by malocclusion. This article describes the speech evaluation for children with cleft palate and how the results of the evaluation are used to make treatment decisions. Instrumental procedures that provide objective data regarding the function of the velopharyngeal valve, and the 2 most common methods of velopharyngeal imaging, are also described. Because many readers are not familiar with phonetic symbols for speech phonemes, Standard English letters are used for clarity.

  8. Neologistic speech automatisms during complex partial seizures.

    PubMed

    Bell, W L; Horner, J; Logue, P; Radtke, R A

    1990-01-01

    There are no documented cases of seizures causing reiterative neologistic speech automatisms. We report an 18-year-old right-handed woman with stereotypic ictal speech automatisms characterized by phonemic jargon and reiterative neologisms. Video-EEG during the reiterative neologisms demonstrated rhythmic delta activity, which was most prominent in the left posterior temporal region. At surgery, there was an arteriovenous malformation impinging on the left supramarginal gyrus and the posterior portion of the superior temporal gyrus. Though intelligible speech automatisms can result from seizure foci in either hemisphere, neologistic speech automatisms may implicate a focus in the language-dominant hemisphere.

  9. 75 FR 26701 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-12

    ...; DA 10-761] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities AGENCY: Federal Communications Commission. ACTION: Proposed rule. SUMMARY.... The Bureau seeks comment on NECA's proposed compensation rates for Interstate TRS,...

  10. Speech Planning Happens before Speech Execution: Online Reaction Time Methods in the Study of Apraxia of Speech

    ERIC Educational Resources Information Center

    Maas, Edwin; Mailend, Marja-Liisa

    2012-01-01

    Purpose: The purpose of this article is to present an argument for the use of online reaction time (RT) methods to the study of apraxia of speech (AOS) and to review the existing small literature in this area and the contributions it has made to our fundamental understanding of speech planning (deficits) in AOS. Method: Following a brief…

  11. Perceptual centres in speech - an acoustic analysis

    NASA Astrophysics Data System (ADS)

    Scott, Sophie Kerttu

    Perceptual centres, or P-centres, represent the perceptual moments of occurrence of acoustic signals - the 'beat' of a sound. P-centres underlie the perception and production of rhythm in perceptually regular speech sequences. P-centres have been modelled both in speech and non speech (music) domains. The three aims of this thesis were toatest out current P-centre models to determine which best accounted for the experimental data bto identify a candidate parameter to map P-centres onto (a local approach) as opposed to the previous global models which rely upon the whole signal to determine the P-centre the final aim was to develop a model of P-centre location which could be applied to speech and non speech signals. The first aim was investigated by a series of experiments in which a) speech from different speakers was investigated to determine whether different models could account for variation between speakers b) whether rendering the amplitude time plot of a speech signal affects the P-centre of the signal c) whether increasing the amplitude at the offset of a speech signal alters P-centres in the production and perception of speech. The second aim was carried out by a) manipulating the rise time of different speech signals to determine whether the P-centre was affected, and whether the type of speech sound ramped affected the P-centre shift b) manipulating the rise time and decay time of a synthetic vowel to determine whether the onset alteration was had more affect on P-centre than the offset manipulation c) and whether the duration of a vowel affected the P-centre, if other attributes (amplitude, spectral contents) were held constant. The third aim - modelling P-centres - was based on these results. The Frequency dependent Amplitude Increase Model of P-centre location (FAIM) was developed using a modelling protocol, the APU GammaTone Filterbank and the speech from different speakers. The P-centres of the stimuli corpus were highly predicted by attributes of

  12. Prediction and constraint in audiovisual speech perception.

    PubMed

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  13. Speech perception as an active cognitive process.

    PubMed

    Heald, Shannon L M; Nusbaum, Howard C

    2014-01-01

    One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy.

  14. The fragile nature of the speech-perception deficit in dyslexia: natural vs synthetic speech.

    PubMed

    Blomert, Leo; Mitterer, Holger

    2004-04-01

    A number of studies reported that developmental dyslexics are impaired in speech perception, especially for speech signals consisting of rapid auditory transitions. These studies mostly made use of a categorical-perception task with synthetic-speech samples. In this study, we show that deficits in the perception of synthetic speech do not generalise to the perception of more naturally sounding speech, even if the same experimental paradigm is used. This contrasts with the assumption that dyslexics are impaired in the perception of rapid auditory transitions.

  15. Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions.

    PubMed

    Loizou, Philipos C; Kim, Gibak

    2011-01-01

    Existing speech enhancement algorithms can improve speech quality but not speech intelligibility, and the reasons for that are unclear. In the present paper, we present a theoretical framework that can be used to analyze potential factors that can influence the intelligibility of processed speech. More specifically, this framework focuses on the fine-grain analysis of the distortions introduced by speech enhancement algorithms. It is hypothesized that if these distortions are properly controlled, then large gains in intelligibility can be achieved. To test this hypothesis, intelligibility tests are conducted with human listeners in which we present processed speech with controlled speech distortions. The aim of these tests is to assess the perceptual effect of the various distortions that can be introduced by speech enhancement algorithms on speech intelligibility. Results with three different enhancement algorithms indicated that certain distortions are more detrimental to speech intelligibility degradation than others. When these distortions were properly controlled, however, large gains in intelligibility were obtained by human listeners, even by spectral-subtractive algorithms which are known to degrade speech quality and intelligibility.

  16. Speech discrimination after early exposure to pulsed-noise or speech.

    PubMed

    Ranasinghe, Kamalini G; Carraway, Ryan S; Borland, Michael S; Moreno, Nicole A; Hanacik, Elizabeth A; Miller, Robert S; Kilgard, Michael P

    2012-07-01

    Early experience of structured inputs and complex sound features generate lasting changes in tonotopy and receptive field properties of primary auditory cortex (A1). In this study we tested whether these changes are severe enough to alter neural representations and behavioral discrimination of speech. We exposed two groups of rat pups during the critical period of auditory development to pulsed-noise or speech. Both groups of rats were trained to discriminate speech sounds when they were young adults, and anesthetized neural responses were recorded from A1. The representation of speech in A1 and behavioral discrimination of speech remained robust to altered spectral and temporal characteristics of A1 neurons after pulsed-noise exposure. Exposure to passive speech during early development provided no added advantage in speech sound processing. Speech training increased A1 neuronal firing rate for speech stimuli in naïve rats, but did not increase responses in rats that experienced early exposure to pulsed-noise or speech. Our results suggest that speech sound processing is resistant to changes in simple neural response properties caused by manipulating early acoustic environment.

  17. Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features.

    PubMed

    Schubotz, Wiebke; Brand, Thomas; Kollmeier, Birger; Ewert, Stephan D

    2016-07-01

    Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically referred to as energetic, amplitude modulation, and informational masking. In this study speech intelligibility and speech detection was measured in maskers that vary systematically in the time-frequency domain from steady-state noise to a single interfering talker. Male and female target speech was used in combination with maskers based on speech for the same or different gender. Observed data were compared to predictions of the speech intelligibility index, extended speech intelligibility index, multi-resolution speech-based envelope-power-spectrum model, and the short-time objective intelligibility measure. The different models served as analysis tool to help distinguish between the different masking aspects. Comparison shows that overall masking can to a large extent be explained by short-term energetic masking. However, the other masking aspects (amplitude modulation an informational masking) influence speech intelligibility as well. Additionally, it was obvious that all models showed considerable deviations from the data. Therefore, the current study provides a benchmark for further evaluation of speech prediction models.

  18. The inhibition of stuttering via the presentation of natural speech and sinusoidal speech analogs.

    PubMed

    Saltuklaroglu, Tim; Kalinowski, Joseph

    2006-08-14

    Sensory signals containing speech or gestural (articulatory) information (e.g., choral speech) have repeatedly been found to be highly effective inhibitors of stuttering. Sine wave analogs of speech consist of a trio of changing pure tones representative of formant frequencies. They are otherwise devoid of traditional speech cues, yet have proven to evoke consistent linguistic percepts in listeners. Thus, we investigated the potency of sinusoidal speech for inhibiting stuttering. Ten adults who stutter read while listening to (a) forward-flowing natural speech; (b) forward-flowing sinusoid analogs of natural speech; (c) reversed natural speech; (d) reversed sinusoid analogs of natural speech; and (e) a continuous 1000 Hz pure tone. The levels of stuttering inhibition achieved using the sinusoidal stimuli were potent and not significantly different from those achieved using natural speech (approximately 50% in forward conditions and approximately 25% in the reversed conditions), suggesting that the patterns of undulating pure tones are sufficient to endow sinusoidal sentences with 'quasi-gestural' qualities. These data highlight the sensitivity of a specialized 'phonetic module' for extracting gestural information from sensory stimuli. Stuttering inhibition is thought to occur when perceived gestural information facilitates fluent productions via the engagement of mirror neurons (e.g., in Broca's area), which appear to play a crucial role in our ability to perceive and produce speech.

  19. Speech detection in spatial and nonspatial speech maskers.

    PubMed

    Balakrishnan, Uma; Freyman, Richard L

    2008-05-01

    The effect of perceived spatial differences on masking release was examined using a 4AFC speech detection paradigm. Targets were 20 words produced by a female talker. Maskers were recordings of continuous streams of nonsense sentences spoken by two female talkers and mixed into each of two channels (two talker, and the same masker time reversed). Two masker spatial conditions were employed: "RF" with a 4 ms time lead to the loudspeaker 60 degrees horizontally to the right, and "FR" with the time lead to the front (0 degrees ) loudspeaker. The reference nonspatial "F" masker was presented from the front loudspeaker only. Target presentation was always from the front loudspeaker. In Experiment 1, target detection threshold for both natural and time-reversed spatial maskers was 17-20 dB lower than that for the nonspatial masker, suggesting that significant release from informational masking occurs with spatial speech maskers regardless of masker understandability. In Experiment 2, the effectiveness of the FR and RF maskers was evaluated as the right loudspeaker output was attenuated until the two-source maskers were indistinguishable from the F masker, as measured independently in a discrimination task. Results indicated that spatial release from masking can be observed with barely noticeable target-masker spatial differences.

  20. The Speech Spectrum and its Relationship to Intelligibility of Speech

    NASA Astrophysics Data System (ADS)

    Englert, Sue Ellen

    The present experiment was designed to investigate and understand the causes of failures of the Articulation Index as a predictive tool. An electroacoustic system was used in which: (1) The frequency response was optimally flattened at the listener's ear. (2) An ear-insert earphone was designed to give close electroacoustic control. (3) An infinite-impulse-response digital filter was used to filter the speech signal from a pre-recorded nonsense syllable test. (4) Four formant regions were filtered in fourteen different ways. It was found that the results agreed with past experiments in that: (1) The Articulation Index fails as a predictive tool when using band-pass filters. (2) Low frequencies seem to mask higher frequencies causing a decrease in intelligibility. It was concluded that: (1) It is inappropriate to relate the total fraction of the speech spectrum to a specific intelligibility score since the fraction remaining after filtering may be in the low-, mid-, or high-frequency range. (2) The relationship between intelligibility and the total area under the spectral curve is not monotonic. (3) The fourth formant region (2925Hz to 4200Hz) enhanced intelligibility when included with other formant regions. Methods for relating spectral regions and intelligibility were discussed.

  1. Open Microphone Speech Understanding: Correct Discrimination Of In Domain Speech

    NASA Technical Reports Server (NTRS)

    Hieronymus, James; Aist, Greg; Dowding, John

    2006-01-01

    An ideal spoken dialogue system listens continually and determines which utterances were spoken to it, understands them and responds appropriately while ignoring the rest This paper outlines a simple method for achieving this goal which involves trading a slightly higher false rejection rate of in domain utterances for a higher correct rejection rate of Out of Domain (OOD) utterances. The system recognizes semantic entities specified by a unification grammar which is specialized by Explanation Based Learning (EBL). so that it only uses rules which are seen in the training data. The resulting grammar has probabilities assigned to each construct so that overgeneralizations are not a problem. The resulting system only recognizes utterances which reduce to a valid logical form which has meaning for the system and rejects the rest. A class N-gram grammar has been trained on the same training data. This system gives good recognition performance and offers good Out of Domain discrimination when combined with the semantic analysis. The resulting systems were tested on a Space Station Robot Dialogue Speech Database and a subset of the OGI conversational speech database. Both systems run in real time on a PC laptop and the present performance allows continuous listening with an acceptably low false acceptance rate. This type of open microphone system has been used in the Clarissa procedure reading and navigation spoken dialogue system which is being tested on the International Space Station.

  2. Earlier speech exposure does not accelerate speech acquisition.

    PubMed

    Peña, Marcela; Werker, Janet F; Dehaene-Lambertz, Ghislaine

    2012-08-15

    Critical periods in language acquisition have been discussed primarily with reference to studies of people who are deaf or bilingual. Here, we provide evidence on the opening of sensitivity to the linguistic environment by studying the response to a change of phoneme at a native and nonnative phonetic boundary in full-term and preterm human infants using event-related potentials. Full-term infants show a decline in their discrimination of nonnative phonetic contrasts between 9 and 12 months of age. Because the womb is a high-frequency filter, many phonemes are strongly degraded in utero. Preterm infants thus benefit from earlier and richer exposure to broadcast speech. We find that preterms do not take advantage of this enriched linguistic environment: the decrease in amplitude of the mismatch response to a nonnative change of phoneme at the end of the first year of life was dependent on maturational age and not on the duration of exposure to broadcast speech. The shaping of phonological representations by the environment is thus strongly constrained by brain maturation factors.

  3. Method and apparatus for obtaining complete speech signals for speech recognition applications

    NASA Technical Reports Server (NTRS)

    Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)

    2009-01-01

    The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

  4. Speech recognition technology: a critique.

    PubMed Central

    Levinson, S E

    1995-01-01

    This paper introduces the session on advanced speech recognition technology. The two papers comprising this session argue that current technology yields a performance that is only an order of magnitude in error rate away from human performance and that incremental improvements will bring us to that desired level. I argue that, to the contrary, present performance is far removed from human performance and a revolution in our thinking is required to achieve the goal. It is further asserted that to bring about the revolution more effort should be expended on basic research and less on trying to prematurely commercialize a deficient technology. PMID:7479808

  5. Speech Generation from Semantic Nets

    DTIC Science & Technology

    1975-09-01

    Speech Generation from Semantic Nets Page 2 (inPUt and output) is monitored bY a "discourse module" ( Deutsch , 1975) to maintain an accurate...in the Phrase, HEURISTIC RULES Hornby describes three basic positions for adverbs in the c1ause1 "front" position, "mid" position, and "end...34 position, Front position adverbs occur before the subject: •YesterdaY he went home, from there he took a taxi," The interrogative adverbs (e,g, how, when

  6. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene.

    PubMed

    Rimmele, Johanna M; Zion Golumbic, Elana; Schröger, Erich; Poeppel, David

    2015-07-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech's temporal envelope ("speech-tracking"), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural versus vocoded speech which preserves the temporal envelope but removes the fine structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech-tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech-tracking more similar to vocoded speech.

  7. Communicative Competence, Speech Acts and Discourse Analysis.

    ERIC Educational Resources Information Center

    McCoy, Terry; And Others

    Three papers intended as preliminary studies to bilingual professional curriculum development are included. "Speech Acts and Discourse Analysis," by Terry McCoy, represents an introduction to discourse analysis as a tool for the language teacher. The notion of a typology of speech acts is set forth, and models of discourse analysis by…

  8. The Preparation of Syllables in Speech Production

    ERIC Educational Resources Information Center

    Cholin, Joana; Schiller, Niels O.; Levelt, Willem J. M.

    2004-01-01

    Models of speech production assume that syllables play a functional role in the process of word-form encoding in speech production. In this study, we investigate this claim and specifically provide evidence about the level at which syllables come into play. We report two studies using an "odd-man-out" variant of the "implicit priming paradigm" to…

  9. Preschoolers Benefit from Visually Salient Speech Cues

    ERIC Educational Resources Information Center

    Lalonde, Kaylah; Holt, Rachael Frush

    2015-01-01

    Purpose: This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method: Twelve adults and 27 typically developing 3-…

  10. Speech as Process: A Case Study.

    ERIC Educational Resources Information Center

    Brooks, Robert D.; Scheidel, Thomas M.

    1968-01-01

    In order to test the internal evaluative processes and not merely the final reactions of an audience to a speaker, 97 Caucasian college students expressed their attitudes toward Malcolm X while listening to a 25-minute tape-recorded speech by him. Eight 30-second silent intervals at natural pauses in the speech gave the students time to respond…

  11. Quick Statistics about Voice, Speech, and Language

    MedlinePlus

    ... Statistics and Epidemiology Quick Statistics About Voice, Speech, Language Voice, Speech, Language, and Swallowing Nearly 1 in 12 (7.7 ... condition known as persistent developmental stuttering. 8 , 9 Language 3.3 percent of U.S. children ages 3- ...

  12. Hypnosis and the Reduction of Speech Anxiety.

    ERIC Educational Resources Information Center

    Barker, Larry L.; And Others

    The purposes of this paper are (1) to review the background and nature of hypnosis, (2) to synthesize research on hypnosis related to speech communication, and (3) to delineate and compare two potential techniques for reducing speech anxiety--hypnosis and systematic desensitization. Hypnosis has been defined as a mental state characterised by…

  13. The Lombard Effect on Alaryngeal Speech.

    ERIC Educational Resources Information Center

    Zeine, Lina; Brandt, John F.

    1988-01-01

    The study investigated the Lombard effect (evoking increased speech intensity by applying masking noise to ears of talker) on the speech of esophageal talkers, artificial larynx users, and normal speakers. The noise condition produced the highest intensity increase in the esophageal speakers. (Author/DB)

  14. Treatment Intensity and Childhood Apraxia of Speech

    ERIC Educational Resources Information Center

    Namasivayam, Aravind K.; Pukonen, Margit; Goshulak, Debra; Hard, Jennifer; Rudzicz, Frank; Rietveld, Toni; Maassen, Ben; Kroll, Robert; van Lieshout, Pascal

    2015-01-01

    Background: Intensive treatment has been repeatedly recommended for the treatment of speech deficits in childhood apraxia of speech (CAS). However, differences in treatment outcomes as a function of treatment intensity have not been systematically studied in this population. Aim: To investigate the effects of treatment intensity on outcome…

  15. Issues in Collecting and Transcribing Speech Samples.

    ERIC Educational Resources Information Center

    Louko, Linda J.; Edwards, Mary Louise

    2001-01-01

    This article identifies issues in phonetic transcription of speech samples for speech language pathologists. Important terms and concepts from articulatory/clinical phonetics are reviewed and guidelines are provided for streamlining the process of whole-word live transcriptions and for refining transcriptions by incorporating features from…

  16. Speech after Mao: Literature and Belonging

    ERIC Educational Resources Information Center

    Hsieh, Victoria Linda

    2012-01-01

    This dissertation aims to understand the apparent failure of speech in post-Mao literature to fulfill its conventional functions of representation and communication. In order to understand this pattern, I begin by looking back on the utility of speech for nation-building in modern China. In addition to literary analysis of key authors and works,…

  17. CLEFT PALATE. FOUNDATIONS OF SPEECH PATHOLOGY SERIES.

    ERIC Educational Resources Information Center

    RUTHERFORD, DAVID; WESTLAKE, HAROLD

    DESIGNED TO PROVIDE AN ESSENTIAL CORE OF INFORMATION, THIS BOOK TREATS NORMAL AND ABNORMAL DEVELOPMENT, STRUCTURE, AND FUNCTION OF THE LIPS AND PALATE AND THEIR RELATIONSHIPS TO CLEFT LIP AND CLEFT PALATE SPEECH. PROBLEMS OF PERSONAL AND SOCIAL ADJUSTMENT, HEARING, AND SPEECH IN CLEFT LIP OR CLEFT PALATE INDIVIDUALS ARE DISCUSSED. NASAL RESONANCE…

  18. Localization of Sublexical Speech Perception Components

    ERIC Educational Resources Information Center

    Turkeltaub, Peter E.; Coslett, H. Branch

    2010-01-01

    Models of speech perception are in general agreement with respect to the major cortical regions involved, but lack precision with regard to localization and lateralization of processing units. To refine these models we conducted two Activation Likelihood Estimation (ALE) meta-analyses of the neuroimaging literature on sublexical speech perception.…

  19. Reliability of Speech Diadochokinetic Test Measurement

    ERIC Educational Resources Information Center

    Gadesmann, Miriam; Miller, Nick

    2008-01-01

    Background: Measures of articulatory diadochokinesis (DDK) are widely used in the assessment of motor speech disorders and they play a role in detecting abnormality, monitoring speech performance changes and classifying syndromes. Although in clinical practice DDK is generally measured perceptually, without support from instrumental methods that…

  20. Language and Legal Speech Acts: Decisions.

    ERIC Educational Resources Information Center

    Kevelson, Roberta

    The first part of this essay argues specifically that legal speech acts are not statements but question/answer constructions. The focus in this section is on the underlying interrogative structure of the legal decision. The second part of the paper touches on significant topics related to the concept of legal speech acts, including the philosophic…

  1. Why Impromptu Speech Is Easy To Understand.

    ERIC Educational Resources Information Center

    Le Feal, K. Dejean

    Impromptu speech is characterized by the simultaneous processes of ideation (the elaboration and structuring of reasoning by the speaker as he improvises) and expression in the speaker. Other elements accompany this characteristic: division of speech flow into short segments, acoustic relief in the form of word stress following a pause, and both…

  2. Recent Trends in Free Speech Theory.

    ERIC Educational Resources Information Center

    Haiman, Franklyn S.

    This syllabus of a convention workshop course on free speech theory consists of descriptions of several United States Supreme Court decisions related to free speech. Some specific areas in which decisions are discussed are: obscene and indecent communication, the definition of a public figure for purposes of libel action, the press versus official…

  3. Tampa Bay International Business Summit Keynote Speech

    NASA Technical Reports Server (NTRS)

    Clary, Christina

    2011-01-01

    A keynote speech outlining the importance of collaboration and diversity in the workplace. The 20-minute speech describes NASA's challenges and accomplishments over the years and what lies ahead. Topics include: diversity and inclusion principles, international cooperation, Kennedy Space Center planning and development, opportunities for cooperation, and NASA's vision for exploration.

  4. Children's Production of Commissive Speech Acts.

    ERIC Educational Resources Information Center

    Astington, Janet W.

    1988-01-01

    Examines the age at which and the form in which children produce speech acts which commit them to a future action. Results revealed that all of the four- to 11-year-olds produced directive speech acts, but only the older children used the explicit performative verb "promise" to reassure the hearer of their commitment. (Author/CB)

  5. Isolated Speech Recognition Using Artificial Neural Networks

    DTIC Science & Technology

    2007-11-02

    In this project Artificial Neural Networks are used as research tool to accomplish Automated Speech Recognition of normal speech. A small size...the first stage of this work are satisfactory and thus the application of artificial neural networks in conjunction with cepstral analysis in isolated word recognition holds promise.

  6. The Two-Fold Way for Speech.

    ERIC Educational Resources Information Center

    McNeill, David

    On the basis of experimental data, the author makes the following observations: (1) the basic encoding processes in speech, the schemas of order, first produce elementary underlying sentences; (2) underlying sentence structure is the controlling step in the organization of speech; (3) underlying sentence structure plays a central role in…

  7. Analog Acoustic Expression in Speech Communication

    ERIC Educational Resources Information Center

    Shintel, Hadas; Nusbaum, Howard C.; Okrent, Arika

    2006-01-01

    We present the first experimental evidence of a phenomenon in speech communication we call "analog acoustic expression." Speech is generally thought of as conveying information in two distinct ways: discrete linguistic-symbolic units such as words and sentences represent linguistic meaning, and continuous prosodic forms convey information about…

  8. Visual speech gestures modulate efferent auditory system.

    PubMed

    Namasivayam, Aravind Kumar; Wong, Wing Yiu Stephanie; Sharma, Dinaay; van Lieshout, Pascal

    2015-03-01

    Visual and auditory systems interact at both cortical and subcortical levels. Studies suggest a highly context-specific cross-modal modulation of the auditory system by the visual system. The present study builds on this work by sampling data from 17 young healthy adults to test whether visual speech stimuli evoke different responses in the auditory efferent system compared to visual non-speech stimuli. The descending cortical influences on medial olivocochlear (MOC) activity were indirectly assessed by examining the effects of contralateral suppression of transient-evoked otoacoustic emissions (TEOAEs) at 1, 2, 3 and 4 kHz under three conditions: (a) in the absence of any contralateral noise (Baseline), (b) contralateral noise + observing facial speech gestures related to productions of vowels /a/ and /u/ and (c) contralateral noise + observing facial non-speech gestures related to smiling and frowning. The results are based on 7 individuals whose data met strict recording criteria and indicated a significant difference in TEOAE suppression between observing speech gestures relative to the non-speech gestures, but only at the 1 kHz frequency. These results suggest that observing a speech gesture compared to a non-speech gesture may trigger a difference in MOC activity, possibly to enhance peripheral neural encoding. If such findings can be reproduced in future research, sensory perception models and theories positing the downstream convergence of unisensory streams of information in the cortex may need to be revised.

  9. Semi-Direct Speech: Manambu and beyond

    ERIC Educational Resources Information Center

    Aikhenvald, Alexandra Y.

    2008-01-01

    Every language has some way of reporting what someone else has said. To express what Jakobson [Jakobson, R., 1990. "Shifters, categories, and the Russian verb. Selected writings". "Word and Language". Mouton, The Hague, Paris, pp. 130-153] called "speech within speech", the speaker can use their own words, recasting…

  10. Repeated Speech Errors: Evidence for Learning

    ERIC Educational Resources Information Center

    Humphreys, Karin R.; Menzies, Heather; Lake, Johanna K.

    2010-01-01

    Three experiments elicited phonological speech errors using the SLIP procedure to investigate whether there is a tendency for speech errors on specific words to reoccur, and whether this effect can be attributed to implicit learning of an incorrect mapping from lemma to phonology for that word. In Experiment 1, when speakers made a phonological…

  11. Pronunciation Modeling for Large Vocabulary Speech Recognition

    ERIC Educational Resources Information Center

    Kantor, Arthur

    2010-01-01

    The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy in automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the…

  12. Speech Fluency in Fragile X Syndrome

    ERIC Educational Resources Information Center

    Van Borsel, John; Dor, Orianne; Rondal, Jean

    2008-01-01

    The present study investigated the dysfluencies in the speech of nine French speaking individuals with fragile X syndrome. Type, number, and loci of dysfluencies were analysed. The study confirms that dysfluencies are a common feature of the speech of individuals with fragile X syndrome but also indicates that the dysfluency pattern displayed is…

  13. STUDIES ON THE SPEECH OF THE DEAF.

    ERIC Educational Resources Information Center

    MARTONY, J.

    A COMPARISON OF SEGMENT FEATURES IN THE SPEECH CHAIN OF THREE DEAF-BORN BOYS WITH THOSE OF THREE NORMAL-HEARING BOYS REVEALS THAT THE DEAF-BORN HAVE SPEECH PROBLEMS ASSOCIATED WITH A LACK OF SYNCHRONY BETWEEN ARTICULATION AND PHONATION. IN ORDER TO DETERMINE THE DIFFERENCES BETWEEN THE TWO GROUPS (BOTH REPEATING THE SAME SWEDISH SENTENCES), A…

  14. The Need for a Speech Corpus

    ERIC Educational Resources Information Center

    Campbell, Dermot F.; McDonnell, Ciaran; Meinardi, Marti; Richardson, Bunny

    2007-01-01

    This paper outlines the ongoing construction of a speech corpus for use by applied linguists and advanced EFL/ESL students. In the first part, sections 1-4, the need for improvements in the teaching of listening skills and pronunciation practice for EFL/ESL students is noted. It is argued that the use of authentic native-to-native speech is…

  15. Performing speech recognition research with hypercard

    NASA Technical Reports Server (NTRS)

    Shepherd, Chip

    1993-01-01

    The purpose of this paper is to describe a HyperCard-based system for performing speech recognition research and to instruct Human Factors professionals on how to use the system to obtain detailed data about the user interface of a prototype speech recognition application.

  16. Speech-Language Pathology: Preparing Early Interventionists

    ERIC Educational Resources Information Center

    Prelock, Patricia A.; Deppe, Janet

    2015-01-01

    The purpose of this article is to explain the role of speech-language pathology in early intervention. The expected credentials of professionals in the field are described, and the current numbers of practitioners serving young children are identified. Several resource documents available from the American Speech-­Language Hearing Association are…

  17. Speech recognition: how good is good enough?

    PubMed

    Krohn, Richard

    2002-03-01

    Since its infancy in the early 1990s, the technology of speech recognition has undergone a rapid evolution. Not only has the reliability of the programming improved dramatically, the return on investment has become increasingly compelling. The author describes some of the latest health care applications of speech-recognition technology, and how the next advances will be made in this area.

  18. Learning the Hidden Structure of Speech.

    ERIC Educational Resources Information Center

    Elman, Jeffery Locke; Zipser, David

    The back-propagation neural network learning procedure was applied to the analysis and recognition of speech. Because this learning procedure requires only examples of input-output pairs, it is not necessary to provide it with any initial description of speech features. Rather, the network develops on its own set of representational features…

  19. Anatomy and Physiology of the Speech Mechanism.

    ERIC Educational Resources Information Center

    Sheets, Boyd V.

    This monograph on the anatomical and physiological aspects of the speech mechanism stresses the importance of a general understanding of the process of verbal communication. Contents include "Positions of the Body,""Basic Concepts Linked with the Speech Mechanism,""The Nervous System,""The Respiratory System--Sound-Power Source,""The…

  20. Walking the Talk on Campus Speech

    ERIC Educational Resources Information Center

    O'Neil, Robert M.

    2004-01-01

    A public university faced with intolerant student speech now risks being damned if it acts, but equally damned if it fails to act. To a greater degree than at any time in recent memory, the actions and policies of higher education institutions concerning student speech not only are being scrutinized, but they also are becoming the subject of legal…

  1. Prospects for Automatic Recognition of Speech.

    ERIC Educational Resources Information Center

    Houde, Robert

    1979-01-01

    Originally part of a symposium on educational media for the deaf, the paper discusses problems with the development of technology permitting simultaneous automatic captioning of speech. It is concluded that success with a machine which will provide automatic recognition of speech is still many years in the future. (PHR)

  2. How Should a Speech Recognizer Work?

    ERIC Educational Resources Information Center

    Scharenborg, Odette; Norris, Dennis; ten Bosch, Louis; McQueen, James M.

    2005-01-01

    Although researchers studying human speech recognition (HSR) and automatic speech recognition (ASR) share a common interest in how information processing systems (human or machine) recognize spoken language, there is little communication between the two disciplines. We suggest that this lack of communication follows largely from the fact that…

  3. Building Searchable Collections of Enterprise Speech Data.

    ERIC Educational Resources Information Center

    Cooper, James W.; Viswanathan, Mahesh; Byron, Donna; Chan, Margaret

    The study has applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, a number of post-processing…

  4. School Principal Speech about Fiscal Mismanagement

    ERIC Educational Resources Information Center

    Hassenpflug, Ann

    2015-01-01

    A review of two recent federal court cases concerning school principals who experienced adverse job actions after they engaged in speech about fiscal misconduct by other employees indicates that the courts found that the principal's speech was made as part of his or her job duties and was not protected by the First Amendment.

  5. Speech masking and cancelling and voice obscuration

    DOEpatents

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  6. Perception of Silent Pauses in Continuous Speech.

    ERIC Educational Resources Information Center

    Duez, Danielle

    1985-01-01

    Investigates the silent pauses in continuous speech in three genres: political speeches, political interviews, and casual interviews in order to see how the semantic-syntactic information of the message, the duration of silent pauses, and the acoustic environment of these pauses interact to produce the listener's perception of pauses. (Author/SED)

  7. Methodological Choices in Rating Speech Samples

    ERIC Educational Resources Information Center

    O'Brien, Mary Grantham

    2016-01-01

    Much pronunciation research critically relies upon listeners' judgments of speech samples, but researchers have rarely examined the impact of methodological choices. In the current study, 30 German native listeners and 42 German L2 learners (L1 English) rated speech samples produced by English-German L2 learners along three continua: accentedness,…

  8. The Learning of Complex Speech Act Behaviour.

    ERIC Educational Resources Information Center

    Olshtain, Elite; Cohen, Andrew

    1990-01-01

    Pre- and posttraining measurement of adult English-as-a-Second-Language learners' (N=18) apology speech act behavior found no clear-cut quantitative improvement after training, although there was an obvious qualitative approximation of native-like speech act behavior in terms of types of intensification and downgrading, choice of strategy, and…

  9. Speech vs. singing: infants choose happier sounds

    PubMed Central

    Corbeil, Marieve; Trehub, Sandra E.; Peretz, Isabelle

    2013-01-01

    Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants' attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4–13 months of age were exposed to happy-sounding infant-directed speech vs. hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children's song spoken vs. sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children's song vs. a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing) was the principal contributor to infant attention, regardless of age. PMID:23805119

  10. Speech Intelligibility in Severe Adductor Spasmodic Dysphonia

    ERIC Educational Resources Information Center

    Bender, Brenda K.; Cannito, Michael P.; Murry, Thomas; Woodson, Gayle E.

    2004-01-01

    This study compared speech intelligibility in nondisabled speakers and speakers with adductor spasmodic dysphonia (ADSD) before and after botulinum toxin (Botox) injection. Standard speech samples were obtained from 10 speakers diagnosed with severe ADSD prior to and 1 month following Botox injection, as well as from 10 age- and gender-matched…

  11. The Neural Substrates of Infant Speech Perception

    ERIC Educational Resources Information Center

    Homae, Fumitaka; Watanabe, Hama; Taga, Gentaro

    2014-01-01

    Infants often pay special attention to speech sounds, and they appear to detect key features of these sounds. To investigate the neural foundation of speech perception in infants, we measured cortical activation using near-infrared spectroscopy. We presented the following three types of auditory stimuli while 3-month-old infants watched a silent…

  12. General-Purpose Monitoring during Speech Production

    ERIC Educational Resources Information Center

    Ries, Stephanie; Janssen, Niels; Dufau, Stephane; Alario, F.-Xavier; Burle, Boris

    2011-01-01

    The concept of "monitoring" refers to our ability to control our actions on-line. Monitoring involved in speech production is often described in psycholinguistic models as an inherent part of the language system. We probed the specificity of speech monitoring in two psycholinguistic experiments where electroencephalographic activities were…

  13. Comparative Experiments on Large Vocabulary Speech Recognition

    DTIC Science & Technology

    1993-01-01

    training. But in this experiment, we also com- puted separate speaker-dependent models for the speakers with 50 -100 utterances, and each speaker...of Speech (R.ASTA-PLP), Proc. of the Second European Conf. on Speech Comm. and Tech. September, 1991. [12] Liu, F-H., Stern, R., Huang, X., Acero , A

  14. Production of Syntactic Stress in Alaryngeal Speech.

    ERIC Educational Resources Information Center

    Gandour, Jack; Weinberg, Bernd

    1985-01-01

    Reports on an acoustical investigation of syntactic stress in alaryngeal speech. Measurements were made of fundamental frequency, relative intensity, vowel duration, and intersyllable duration. Findings suggest that stress contrasts in alaryngeal speech are based on a complex of acoustic cues which are influenced by linguistic structure.…

  15. Voice Modulations in German Ironic Speech

    ERIC Educational Resources Information Center

    Scharrer, Lisa; Christmann, Ursula; Knoll, Monja

    2011-01-01

    Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech, and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German "ironic…

  16. Speech neglect: A strange educational blind spot

    NASA Astrophysics Data System (ADS)

    Harris, Katherine Safford

    2005-09-01

    Speaking is universally acknowledged as an important human talent, yet as a topic of educated common knowledge, it is peculiarly neglected. Partly, this is a consequence of the relatively recent growth of research on speech perception, production, and development, but also a function of the way that information is sliced up by undergraduate colleges. Although the basic acoustic mechanism of vowel production was known to Helmholtz, the ability to view speech production as a physiological event is evolving even now with such techniques as fMRI. Intensive research on speech perception emerged only in the early 1930s as Fletcher and the engineers at Bell Telephone Laboratories developed the transmission of speech over telephone lines. The study of speech development was revolutionized by the papers of Eimas and his colleagues on speech perception in infants in the 1970s. Dissemination of knowledge in these fields is the responsibility of no single academic discipline. It forms a center for two departments, Linguistics, and Speech and Hearing, but in the former, there is a heavy emphasis on other aspects of language than speech and, in the latter, a focus on clinical practice. For psychologists, it is a rather minor component of a very diverse assembly of topics. I will focus on these three fields in proposing possible remedies.

  17. Using Concatenated Profiles from High-Speed Laser Profile Scanners to Estimate Debris-Flow Characteristics: A Novel Approach Based on Particle Image Velocimetry

    NASA Astrophysics Data System (ADS)

    Jacquemart, M. F.; Meier, L.; Graf, C.; Morsdorf, F.

    2015-12-01

    We use globally unique datasets from paired laser profile scanners to measure debris-flow height, velocity and discharge in two well-known debris-flow channels in Switzerland. Since 2011, these scanners have been scanning passing debris flows at rates of up to 75 Hz, acquiring millions of cross-bed profiles. The profiles can be concatenated through time, generating unique 2.5D representations of passing debris flows. Applying a large-scale Particle Image Velocimetry (PIV) approach to these datasets has proven successful to measure surface flow velocities. Flow height can also be estimated from the laser scanners, and thus a discharge estimate can be given. To account for changes to the channel bed due to erosion and deposition during the debris flow, we compute two flow height estimates using a pre-event as well as a post-event channel geometry in order to visualize discharge variability.Velocity outliers need to be excluded to provide reliable estimates of peak discharge, and changes to the channel bed are assumed to be the largest source of uncertainty. However, the latter problem is inherent to all debris-flow discharge measurements, and we have found the new system to offer distinct advantages over the conventional system relying on geophones and a radar gauge. The wide scan angle of up to 190° renders the scanners insensitive to changes of the flow path, and the point density of roughly 20 points per meter offer unprecedented spatial coverage.In addition, the geometries of the cross-bed profiles have been analyzed, revealing distinct changes of cross-flow convexity between the front and the tail of the flows in several cases. This is assumed to indicate changes of debris-flow mixtures, but further research is needed to better understand this signal.We hope that our preliminary analysis and toolbox will facilitate working with these kinds of datasets so as to further improve debris-flow understanding, monitoring and modeling efforts in the future.

  18. The Effects of Stimulus Variability on the Perceptual Learning of Speech and Non-Speech Stimuli

    PubMed Central

    Banai, Karen; Amitay, Sygal

    2015-01-01

    Previous studies suggest fundamental differences between the perceptual learning of speech and non-speech stimuli. One major difference is in the way variability in the training set affects learning and its generalization to untrained stimuli: training-set variability appears to facilitate speech learning, while slowing or altogether extinguishing non-speech auditory learning. We asked whether the reason for this apparent difference is a consequence of the very different methodologies used in speech and non-speech studies. We hypothesized that speech and non-speech training would result in a similar pattern of learning if they were trained using the same training regimen. We used a 2 (random vs. blocked pre- and post-testing) × 2 (random vs. blocked training) × 2 (speech vs. non-speech discrimination task) study design, yielding 8 training groups. A further 2 groups acted as untrained controls, tested with either random or blocked stimuli. The speech task required syllable discrimination along 4 minimal-pair continua (e.g., bee-dee), and the non-speech stimuli required duration discrimination around 4 base durations (e.g., 50 ms). Training and testing required listeners to pick the odd-one-out of three stimuli, two of which were the base duration or phoneme continuum endpoint and the third varied adaptively. Training was administered in 9 sessions of 640 trials each, spread over 4–8 weeks. Significant learning was only observed following speech training, with similar learning rates and full generalization regardless of whether training used random or blocked schedules. No learning was observed for duration discrimination with either training regimen. We therefore conclude that the two stimulus classes respond differently to the same training regimen. A reasonable interpretation of the findings is that speech is perceived categorically, enabling learning in either paradigm, while the different base durations are not well-enough differentiated to allow for

  19. Strategies for distant speech recognitionin reverberant environments

    NASA Astrophysics Data System (ADS)

    Delcroix, Marc; Yoshioka, Takuya; Ogawa, Atsunori; Kubo, Yotaro; Fujimoto, Masakiyo; Ito, Nobutaka; Kinoshita, Keisuke; Espi, Miquel; Araki, Shoko; Hori, Takaaki; Nakatani, Tomohiro

    2015-12-01

    Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.

  20. [Improving the speech with a prosthetic construction].

    PubMed

    Stalpers, M J; Engelen, M; van der Stappen, J A A M; Weijs, W L J; Takes, R P; van Heumen, C C M

    2016-03-01

    A 12-year-old boy had problems with his speech due to a defect in the soft palate. This defect was caused by the surgical removal of a synovial sarcoma. Testing with a nasometer revealed hypernasality above normal values. Given the size and severity of the defect in the soft palate, the possibility of improving the speech with speech therapy was limited. At a centre for special dentistry an attempt was made with a prosthetic construction to improve the performance of the palate and, in that way, the speech. This construction consisted of a denture with an obturator attached to it. With it, an effective closure of the palate could be achieved. New measurements with acoustic nasometry showed scores within the normal values. The nasality in the speech largely disappeared. The obturator is an effective and relatively easy solution for palatal insufficiency resulting from surgical resection. Intrusive reconstructive surgery can be avoided in this way.