Science.gov

Sample records for concatenative speech synthesis

  1. Perception of interrupted speech: Cross-rate variation in the intelligibility of gated and concatenated sentences

    PubMed Central

    Shafiro, Valeriy; Sheft, Stanley; Risley, Robert

    2011-01-01

    Temporal constraints on the perception of variable-size speech fragments produced by interruption rates between 0.5 and 16 Hz were investigated by contrasting the intelligibility of gated sentences with and without silent intervals. Concatenation of consecutive speech fragments produced a significant decrease in intelligibility at 2 and 4 Hz, while having little effect at lower and higher rates. Consistent with previous studies, these findings indicate that (1) syllable-sized intervals associated with intermediate-rate interruptions are more susceptible to temporal distortions than the longer word-size or shorter phoneme-size intervals and (2) suggest qualitative differences in underlying perceptual processes at different rates. PMID:21877768

  2. Models of speech synthesis.

    PubMed Central

    Carlson, R

    1995-01-01

    The term "speech synthesis" has been used for diverse technical approaches. In this paper, some of the approaches used to generate synthetic speech in a text-to-speech system are reviewed, and some of the basic motivations for choosing one method over another are discussed. It is important to keep in mind, however, that speech synthesis models are needed not just for speech generation but to help us understand how speech is created, or even how articulation can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages are discussed as special challenges facing the speech synthesis community. PMID:7479805

  3. [Visual synthesis of speech].

    PubMed

    Blanco, Y; Villanueva, A; Cabeza, R

    2000-01-01

    The eyes can come to be the sole tool of communication for highly disabled patients. With the appropriate technology it is possible to successfully interpret eye movements, increasing the possibilities of patient communication with the use of speech synthesisers. A system of these characteristics will have to include a speech synthesiser, an interface for the user to construct the text and a method of gaze interpretation. In this way a situation will be achieved in which the user will manage the system solely with his eyes. This review sets out the state of the art of the three modules that make up a system of this type, and finally it introduces the speech synthesis system (Síntesis Visual del Habla [SiVHa]), which is being developed in the Public University of Navarra.

  4. Homomorphic Speech Analysis-Synthesis

    DTIC Science & Technology

    1978-01-01

    summarized in ti e following projects: IMomornorphic speech analysis-synthesis, enhance- ment of degraded speech, time -varying linear predictive coding of...based on both the conventional chirp-z-transform ( CZT ) realization of the discrete Fourier transform and the sliding CZT realization of the discrete...sliding Fourier trans- form. These realizations are amenable to CCD technology and allow for real- time , low- cost implementation of the homomorphic

  5. Speech Compression and Synthesis

    DTIC Science & Technology

    1980-03-01

    appreciable roughness in the coded speech and some "thuds." Also noticeable at 6.4 kb/s, especially for female voices , is the reverberant quality of the...The major acoustic attribute of a glottal stop is a lowered pitch contour. An experiment reported in [1] indicated that this pitch lowering occurs...channel to an arbitrary value. All tests were done with 5 male and 5 female sentences. Our present choice of a good compromise system operating

  6. Linguistic aspects of speech synthesis.

    PubMed Central

    Allen, J

    1995-01-01

    The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized. PMID:7479807

  7. Simulation of Human Speech Production Applied to the Study and Synthesis of European Portuguese

    NASA Astrophysics Data System (ADS)

    Teixeira, António J. S.; Martinez, Roberto; Silva, Luís Nuno; Jesus, Luis M. T.; Príncipe, Jose C.; Vaz, Francisco A. C.

    2005-12-01

    A new articulatory synthesizer (SAPWindows), with a modular and flexible design, is described. A comprehensive acoustic model and a new interactive glottal source were implemented. Perceptual tests and simulations made possible by the synthesizer contributed to deepening our knowledge of one of the most important characteristics of European Portuguese, the nasal vowels. First attempts at incorporating models of frication into the articulatory synthesizer are presented, demonstrating the potential of performing fricative synthesis based on broad articulatory configurations. Synthesis of nonsense words and Portuguese words with vowels and nasal consonants is also shown. Despite not being capable of competing with mainstream concatenative speech synthesis, the anthropomorphic approach to speech synthesis, known as articulatory synthesis, proved to be a valuable tool for phonetics research and teaching. This was particularly true for the European Portuguese nasal vowels.

  8. Speech Synthesis Applied to Language Teaching.

    ERIC Educational Resources Information Center

    Sherwood, Bruce

    1981-01-01

    The experimental addition of speech output to computer-based Esperanto lessons using speech synthesized from text is described. Because of Esperanto's phonetic spelling and simple rhythm, it is particularly easy to describe the mechanisms of Esperanto synthesis. Attention is directed to how the text-to-speech conversion is performed and the ways…

  9. Fifty years of progress in speech synthesis

    NASA Astrophysics Data System (ADS)

    Schroeter, Juergen

    2004-10-01

    A common opinion is that progress in speech synthesis should be easier to discern than in other areas of speech communication: you just have to listen to the speech! Unfortunately, things are more complicated. It can be said, however, that early speech synthesis efforts were primarily concerned with providing intelligible speech, while, more recently, ``naturalness'' has been the focus. The field had its ``electronic'' roots in Homer Dudley's 1939 ``Voder,'' and it advanced in the 1950s and 1960s through progress in a number of labs including JSRU in England, Haskins Labs in the U.S., and Fant's Lab in Sweden. In the 1970s and 1980s significant progress came from efforts at Bell Labs (under Jim Flanagan's leadership) and at MIT (where Dennis Klatt created one of the first commercially viable systems). Finally, over the past 15 years, the methods of unit-selection synthesis were devised, primarily at ATR in Japan, and were advanced by work at AT&T Labs, Univ. of Edinburgh, and ATR. Today, TTS systems are able to ``convince some of the listeners some of the time'' that synthetic speech is as natural as live recordings. Ongoing efforts aim at replacing ``some'' with ``most'' for a wide range of real-world applications.

  10. Segmental intelligibility of four currently used text-to-speech synthesis methods.

    PubMed

    Venkatagiri, Horabail S

    2003-04-01

    The study investigated the segmental intelligibility of four currently available text-to-speech (TTS) products under 0-dB and 5-dB signal-to-noise ratios. The products were IBM ViaVoice version 5.1, which uses formant coding, Festival version 1.4.2, a diphone-based LPC TTS product, AT&T Next-Gen, a half-phone-based TTS product that uses harmonic-plus-noise method for synthesis, and FlexVoice2, a hybrid TTS product that combines concatenative and formant coding techniques. Overall, concatenative techniques were more intelligible than formant or hybrid techniques, with formant coding slightly better at modeling vowels and concatenative techniques marginally better at synthesizing consonants. No TTS product was better at resisting noise interference than others, although all were more intelligible at 5 dB than at 0-dB SNR. The better TTS products in this study were, on the average, 22% less intelligible and had about 3 times more phoneme errors than human voice under comparable listening conditions. The hybrid TTS technology of FlexVoice had the lowest intelligibility and highest error rates. There were discernible patterns of errors for stops, fricatives, and nasals. Unrestricted TTS output--e-mail messages, news reports, and so on--under high noise conditions prevalent in automobiles, airports, etc. will likely challenge the listeners.

  11. Expressive facial animation synthesis by learning speech coarticulation and expression spaces.

    PubMed

    Deng, Zhigang; Neumann, Ulrich; Lewis, J P; Kim, Tae-Yong; Bulut, Murtaza; Narayanan, Shrikanth

    2006-01-01

    Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subject are captured while he/she recites a predesigned corpus, with specific spoken and visual expressions. We present a novel motion capture mining technique that "learns" speech coarticulation models for diphones and triphones from the recorded data. A Phoneme-Independent Expression Eigenspace (PIEES) that encloses the dynamic expression signals is constructed by motion signal processing (phoneme-based time-warping and subtraction) and Principal Component Analysis (PCA) reduction. New expressive facial animations are synthesized as follows: First, the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input, then a texture-synthesis-based approach is used to generate a novel dynamic expression signal from the PIEES model, and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation. Our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation.

  12. Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis.

    PubMed

    Gibert, Guillaume; Olsen, Kirk N; Leung, Yvonne; Stevens, Catherine J

    2015-01-01

    Virtual humans have become part of our everyday life (movies, internet, and computer games). Even though they are becoming more and more realistic, their speech capabilities are, most of the time, limited and not coherent and/or not synchronous with the corresponding acoustic signal. We describe a method to convert a virtual human avatar (animated through key frames and interpolation) into a more naturalistic talking head. In fact, speech articulation cannot be accurately replicated using interpolation between key frames and talking heads with good speech capabilities are derived from real speech production data. Motion capture data are commonly used to provide accurate facial motion for visible speech articulators (jaw and lips) synchronous with acoustics. To access tongue trajectories (partially occluded speech articulator), electromagnetic articulography (EMA) is often used. We recorded a large database of phonetically-balanced English sentences with synchronous EMA, motion capture data, and acoustics. An articulatory model was computed on this database to recover missing data and to provide 'normalized' animation (i.e., articulatory) parameters. In addition, semi-automatic segmentation was performed on the acoustic stream. A dictionary of multimodal Australian English diphones was created. It is composed of the variation of the articulatory parameters between all the successive stable allophones. The avatar's facial key frames were converted into articulatory parameters steering its speech articulators (jaw, lips and tongue). The speech production database was used to drive the Embodied Conversational Agent (ECA) and to enhance its speech capabilities. A Text-To-Auditory Visual Speech synthesizer was created based on the MaryTTS software and on the diphone dictionary derived from the speech production database. We describe a method to transform an ECA with generic tongue model and animation by key frames into a talking head that displays naturalistic tongue

  13. Vocoders and Speech Perception: Uses of Computer-Based Speech Analysis-Synthesis in Stimulus Generation.

    ERIC Educational Resources Information Center

    Tierney, Joseph; Mack, Molly

    1987-01-01

    Stimuli used in research on the perception of the speech signal have often been obtained from simple filtering and distortion of the speech waveform, sometimes accompanied by noise. However, for more complex stimulus generation, the parameters of speech can be manipulated, after analysis and before synthesis, using various types of algorithms to…

  14. Vocoders and Speech Perception: Uses of Computer-Based Speech Analysis-Synthesis in Stimulus Generation.

    ERIC Educational Resources Information Center

    Tierney, Joseph; Mack, Molly

    1987-01-01

    Stimuli used in research on the perception of the speech signal have often been obtained from simple filtering and distortion of the speech waveform, sometimes accompanied by noise. However, for more complex stimulus generation, the parameters of speech can be manipulated, after analysis and before synthesis, using various types of algorithms to…

  15. Generalized concatenated quantum codes

    SciTech Connect

    Grassl, Markus; Shor, Peter; Smith, Graeme; Smolin, John; Zeng Bei

    2009-05-15

    We discuss the concept of generalized concatenated quantum codes. This generalized concatenation method provides a systematical way for constructing good quantum codes, both stabilizer codes and nonadditive codes. Using this method, we construct families of single-error-correcting nonadditive quantum codes, in both binary and nonbinary cases, which not only outperform any stabilizer codes for finite block length but also asymptotically meet the quantum Hamming bound for large block length.

  16. Infants' brain responses to speech suggest analysis by synthesis.

    PubMed

    Kuhl, Patricia K; Ramírez, Rey R; Bosseler, Alexis; Lin, Jo-Fu Lotus; Imada, Toshiaki

    2014-08-05

    Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners' knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca's area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of "motherese" on early language learning, and (iii) the "social-gating" hypothesis and humans' development of social understanding.

  17. Infants’ brain responses to speech suggest Analysis by Synthesis

    PubMed Central

    Kuhl, Patricia K.; Ramírez, Rey R.; Bosseler, Alexis; Lin, Jo-Fu Lotus; Imada, Toshiaki

    2014-01-01

    Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners’ knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca’s area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of “motherese” on early language learning, and (iii) the “social-gating” hypothesis and humans’ development of social understanding. PMID:25024207

  18. Speech analysis and synthesis based on pitch-synchronous segmentation of the speech waveform

    NASA Astrophysics Data System (ADS)

    Kang, George S.; Fransen, Lawrence J.

    1994-11-01

    This report describes a new speech analysis/synthesis method. This new technique does not attempt to model the human speech production mechanism. Instead, we represent the speech waveform directly in terms of the speech waveform defined in a pitch period. A significant merit of this approach is the complete elimination of pitch interference because each pitch-synchronously segmented waveform does not include a waveform discontinuity. One application of this new speech analysis/synthesis method is the alteration of speech characteristics directly on raw speech. With the increased use of man-made speech in tactical voice message systems and virtual reality environments, such a speech generation tool is highly desirable. Another application is speech encoding operation at low data rates (2400 b/s or less). According to speech intelligibility tests, our new 2400 b/s encoder outperforms the current 2400-b/s LPC. This is also true in noisy environments. Because most tactical platforms are noisy (e.g., helicopter, high-performance aircraft, tank, destroyer), our 2400-b/s speech encoding technique will make tactical voice communication more effective; it will become an indispensable capability for future C4I.

  19. Synthesis of IB-01211, a cyclic peptide containing 2,4-concatenated thia- and oxazoles, via Hantzsch macrocyclization.

    PubMed

    Hernández, Delia; Vilar, Gemma; Riego, Estela; Cañedo, Librada M; Cuevas, Carmen; Albericio, Fernando; Alvarez, Mercedes

    2007-03-01

    [structure: see text] An efficient and versatile convergent synthesis of IB-01211 based on a combination of peptide and heterocyclic chemistry is described. The key step in the synthesis is macrocyclization through intramolecular Hantzsch formation of the thiazole ring. Dehydration of a free primary alcohol to furnish the exocyclic methylidene present in the natural product was applied during the macrocyclization.

  20. Auto Spell Suggestion for High Quality Speech Synthesis in Hindi

    NASA Astrophysics Data System (ADS)

    Kabra, Shikha; Agarwal, Ritika

    2014-02-01

    The goal of Text-to-Speech (TTS) synthesis in a particular language is to convert arbitrary input text to intelligible and natural sounding speech. However, for a particular language like Hindi, which is a highly confusing language (due to very close spellings), it is not an easy task to identify errors/mistakes in input text and an incorrect text degrade the quality of output speech hence this paper is a contribution to the development of high quality speech synthesis with the involvement of Spellchecker which generates spell suggestions for misspelled words automatically. Involvement of spellchecker would increase the efficiency of speech synthesis by providing spell suggestions for incorrect input text. Furthermore, we have provided the comparative study for evaluating the resultant effect on to phonetic text by adding spellchecker on to input text.

  1. Evaluation of Speech Synthesis Systems using the Speech Reception Threshold Methodology

    DTIC Science & Technology

    2005-04-01

    RTO-MP-HFM-123 9 - 1 Evaluation of Speech Synthesis Systems using the Speech Reception Threshold Methodology David A. van Leeuwen and Johan van...exposed to. The sentences are linguistically meaningful, and consist of 8-9 syllables. This type of van Leeuwen , D.A.; van Balken, J. (2005) Evaluation...based synthesis system than more diagnostic linguistic material, such as Semantically Unpredictable Sentences (SUS) or nonsense consonant- vowel

  2. Speech synthesis using an aeroacoustic fricative model

    NASA Astrophysics Data System (ADS)

    Sinder, Daniel Jared

    isolation and in a vowel context. The results show the strong potential for this approach to produce high quality unvoiced speech without the need to estimate source strength, spectra, or location for different vocal tract geometries. That is, the synthesis of unvoiced sounds is gained automatically from the articulatory description. (Abstract shortened by UMI.)

  3. Voice Quality Modelling for Expressive Speech Synthesis

    PubMed Central

    Socoró, Joan Claudi

    2014-01-01

    This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics. PMID:24587738

  4. Giving Voice to Student Writing: Exploring the Uses of Speech Recognition and Speech Synthesis in a Writing Curriculum. Activities for Word Processing with Speech Synthesis. Teacher and Student Manual.

    ERIC Educational Resources Information Center

    Burenstein, Ben

    Produced by a demonstration project that explored the use of speech synthesis and speech recognition in writing-based classes held by two Philadelphia literacy providers, this manual was developed for teachers who may wish to integrate speech synthesis into their curriculum. It contains a description of the technologies and activities that may be…

  5. Speech Synthesis Using Perceptually Motivated Features

    DTIC Science & Technology

    2012-01-23

    it can produce unintelligible speech, and moreover is difficult to adapt to different speaking styles and expressive/ emotional content. Because the...were combined with three different vowels ([i], [a], [uj) using a corpus developed at Aalborg University and embedded in a di-syllable, -la (tab. mils...5 peaks per second Low-frequency cochJeer channels mainly reflect the presence of vowels and nasals, and high frequency channels mainly reflect

  6. Speech Analysis/Synthesis Based on Perception.

    DTIC Science & Technology

    1984-11-05

    unlimited. DTIC" S EL. _ KAR 15 19%5 A LEXINGTON MASSACHUSETTS .- i~i! ABSTRACT r,’ ...A speech analysis system based on a combination of physiological ...AUDITORY MODEL BASED ON PHYSIOLOGICAL RESULTSL................................................. 8 2.3 A SIMPLIFIED AUDITORY MODEL INCORPORATING... physiological studies of the auditory system are applied, it may be possible to design improved ASR machines. When applying auditory system results to the

  7. How Foreign are ’Foreign’ Speech Sounds? Implications for Speech Recognition and Speech Synthesis

    DTIC Science & Technology

    2000-08-01

    language acquisition (SLA) research. The This paper reports results from a production study which phonological processes involved when approaching a...Language Speech Sounds. In James, A. & J. Leather (eds.). Sound Patterns in Second Language Acquisition , Foris Publications.

  8. Towards personalized speech synthesis for augmentative and alternative communication.

    PubMed

    Mills, Timothy; Bunnell, H Timothy; Patel, Rupal

    2014-09-01

    Text-to-speech options on augmentative and alternative communication (AAC) devices are limited. Often, several individuals in a group setting use the same synthetic voice. This lack of customization may limit technology adoption and social integration. This paper describes our efforts to generate personalized synthesis for users with profoundly limited speech motor control. Existing voice banking and voice conversion techniques rely on recordings of clearly articulated speech from the target talker, which cannot be obtained from this population. Our VocaliD approach extracts prosodic properties from the target talker's source function and applies these features to a surrogate talker's database, generating a synthetic voice with the vocal identity of the target talker and the clarity of the surrogate talker. Promising intelligibility results suggest areas of further development for improved personalization.

  9. Alternative Speech Communication System for Persons with Severe Speech Disorders

    NASA Astrophysics Data System (ADS)

    Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

    2009-12-01

    Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.

  10. Modeling Consonant-Vowel Coarticulation for Articulatory Speech Synthesis

    PubMed Central

    Birkholz, Peter

    2013-01-01

    A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis. PMID:23613734

  11. Surface electromyographic control of speech synthesis.

    PubMed

    Cler, Meredith J; Nieto-Castanon, Alfonso; Guenther, Frank H; Stepp, Cara E

    2014-01-01

    Individuals with very high spinal cord injuries (e.g. C1-C3) may be ventilator-dependent and therefore unable to support speech breathing. However, their facial musculature is intact, given that these muscles are innervated by cranial nerves. We developed a system using surface electromyography (sEMG) recorded from facial muscles to control a phonemic interface and voice synthesizer and tested the system in healthy individuals. Users were able to use five facial gestures to control an onscreen cursor and the phonemic interface. Users had mean information transfer rates (ITRs) of 59.5 bits/min when calculating ITRs using the number of phonemes selected. To compare with orthographic systems, ITRs were also calculated using the equivalent number of letters required to spell the selected word. With this calculation, users had a mean ITR of 70.1. Results are promising for further development and testing in individuals with high spinal cord injuries.

  12. Inverse solution of speech production based on perturbation theory and its application to articulatory speech synthesis

    NASA Astrophysics Data System (ADS)

    Yu, Zhenli

    1998-12-01

    The inverse solution of speech production for formant targets of vowels and vowel-to-vowel transitions is studied. Band-limited Fourier cosine expansion of vocal- tract area function or its logarithm is used to model the vocal-tract shape. The inverse solution is based on the perturbation theory of speech production incorporate with a fast calculation of the vocal-tract system. An interpolation method for dynamic constraint on the unobservable zeros and vocal-tract length along the transition between the endpoint of vowel-to-vowel transition is proposed. A unique mapping acoustic-to- geometry codebook is used to match the zeros and vocal tract length of the endpoint. The codebook is designed by geometrical and acoustical constraints. Computer simulation of the evaluation of the inverse solution shows reasonable results with respect to the naturalness of transition behavior of the vocal-tract area function. An articulatory synthesizer with a reflection-type line analog model which is driven by vocal-tract area is implemented. Synthesis evaluation of the performance of the inverse solution for vowel-to-vowel transitions as well as for isolated vowels is conducted. The resultant spectrogram vision and perceptual listening of the synthetic sounds is satisfactory. Quantitative comparison in forms of formant traces reveals fairly good matching of the formants of synthetic sounds to the original one. A novel formant targeted articulatory synthesis, as an application of the inverse solution, is proposed. The entire system consists of an inverse module and a reflection-type line analog model. The synthesizer needs only the first three formant trajectories, pitch contour and amplitude as input parameters. A formant mimic synthesis in which the input parameters can be artificially specified and a formant copy synthesis in which the input parameters are obtained by estimation from real speech sound are implemented. The formant trace or pitch contour can be separately modified

  13. Usage of the HMM-Based Speech Synthesis for Intelligent Arabic Voice

    NASA Astrophysics Data System (ADS)

    Fares, Tamer S.; Khalil, Awad H.; Hegazy, Abd El-Fatah A.

    2008-06-01

    The HMM as a suitable model for time sequence modeling is used for estimation of speech synthesis parameters, A speech parameter sequence is generated from HMMs themselves whose observation vectors consists of spectral parameter vector and its dynamic feature vectors. HMMs generate cepstral coefficients and pitch parameter which are then fed to speech synthesis filter named Mel Log Spectral Approximation (MLSA), this paper explains how this approach can be applied to the Arabic language to produce intelligent Arabic speech synthesis using the HMM-Based Speech Synthesis and the influence of using of the dynamic features and the increasing of the number of mixture components on the quality enhancement of the Arabic speech synthesized.

  14. Hybrid and concatenated coding applications.

    NASA Technical Reports Server (NTRS)

    Hofman, L. B.; Odenwalder, J. P.

    1972-01-01

    Results of a study to evaluate the performance and implementation complexity of a concatenated and a hybrid coding system for moderate-speed deep-space applications. It is shown that with a total complexity of less than three times that of the basic Viterbi decoder, concatenated coding improves a constraint length 8 rate 1/3 Viterbi decoding system by 1.1 and 2.6 dB at bit error probabilities of 0.0001 and one hundred millionth, respectively. With a somewhat greater total complexity, the hybrid coding system is shown to obtain a 0.9-dB computational performance improvement over the basic rate 1/3 sequential decoding system. Although substantial, these complexities are much less than those required to achieve the same performances with more complex Viterbi or sequential decoder systems.

  15. Vocabulary Synthesis Based on Line Spectrum Pairs

    DTIC Science & Technology

    1989-01-12

    STEPHANIE S. EVERETT Human- Computer Interface Laboratory Information Technology Division January 12, 1989 DTFCS FLECTE FESB03I8M3D Approved for public...trverse if necessary and ident y by blch number) SELL) GRO I JR SUB GROUP Speech synthesis Human- computer interface Text-to-speech LSP 19 ABSTRAC’ Continue...sensitive rules, and modified if necessary. Pitch and amplitude curves are computed , and the concatenated segments are then output through the LSP

  16. The Effects on Children's Writing of Adding Speech Synthesis to a Word Processor.

    ERIC Educational Resources Information Center

    Borgh, Karin; Dickson, W. Patrick

    A study examined whether computers equipped with speech synthesis devices could facilitate children's writing. It was hypothesized that children using the devices would write longer stories, edit more, and produce higher quality stories than children not receiving feedback from a speech synthesizer. Subjects were 48 children, three girls and three…

  17. Radio Losses for Concatenated Codes

    NASA Astrophysics Data System (ADS)

    Shambayati, S.

    2002-07-01

    The advent of higher powered spacecraft amplifiers and better ground receivers capable of tracking spacecraft carrier signals with narrower loop bandwidths requires better understanding of the carrier tracking loss (radio loss) mechanism of the concatenated codes used for deep-space missions. In this article, we present results of simulations performed for a (7,1/2), Reed-Solomon (255,223), interleaver depth-5 concatenated code in order to shed some light on this issue. Through these simulations, we obtained the performance of this code over an additive white Gaussian noise (AWGN) channel (the baseline performance) in terms of both its frame-error rate (FER) and its bit-error rate at the output of the Reed-Solomon decoder (RS-BER). After obtaining these results, we curve fitted the baseline performance curves for FER and RS-BER and calculated the high-rate radio losses for this code for an FER of 10^(-4) and its corresponding baseline RS-BER of 2.1 x 10^(-6) for a carrier loop signal-to-noise ratio (SNR) of 14.8 dB. This calculation revealed that even though over the AWGN channel the FER value and the RS-BER value correspond to each other (i.e., these values are obtained by the same bit SNR value), the RS-BER value has higher high-rate losses than does the FER value. Furthermore, this calculation contradicted the previous assumption th at at high data rates concatenated codes have the same radio losses as their constituent convolutional codes. Our results showed much higher losses for the FER and the RS-BER (by as much as 2 dB) than for the corresponding baseline BER of the convolutional code. Further simulations were performed to investigate the effects of changes in the data rate on the code's radio losses. It was observed that as the data rate increased the radio losses for both the FER and the RS-BER approached their respective calculated high-rate values. Furthermore, these simulations showed that a simple two-parameter function could model the increase in the

  18. Towards direct speech synthesis from ECoG: A pilot study.

    PubMed

    Herff, Christian; Johnson, Garett; Diener, Lorenz; Shih, Jerry; Krusienski, Dean; Schultz, Tanja

    2016-08-01

    Most current Brain-Computer Interfaces (BCIs) achieve high information transfer rates using spelling paradigms based on stimulus-evoked potentials. Despite the success of this interfaces, this mode of communication can be cumbersome and unnatural. Direct synthesis of speech from neural activity represents a more natural mode of communication that would enable users to convey verbal messages in real-time. In this pilot study with one participant, we demonstrate that electrocoticography (ECoG) intracranial activity from temporal areas can be used to resynthesize speech in real-time. This is accomplished by reconstructing the audio magnitude spectrogram from neural activity and subsequently creating the audio waveform from these reconstructed spectrograms. We show that significant correlations between the original and reconstructed spectrograms and temporal waveforms can be achieved. While this pilot study uses audibly spoken speech for the models, it represents a first step towards speech synthesis from speech imagery.

  19. Design and performance of an analysis-by-synthesis class of predictive speech coders

    NASA Technical Reports Server (NTRS)

    Rose, Richard C.; Barnwell, Thomas P., III

    1990-01-01

    The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally. The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied. A general formulation for deriving the parametric representation used in all of the coders in the class is presented. A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general. The results of a study comparing the performances of different members of this class are presented. The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders. The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders.

  20. Design and performance of an analysis-by-synthesis class of predictive speech coders

    NASA Technical Reports Server (NTRS)

    Rose, Richard C.; Barnwell, Thomas P., III

    1990-01-01

    The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally. The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied. A general formulation for deriving the parametric representation used in all of the coders in the class is presented. A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general. The results of a study comparing the performances of different members of this class are presented. The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders. The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders.

  1. Design of Serially Concatenated Trellis Coded Modulation

    NASA Technical Reports Server (NTRS)

    Benedetto, S.; Divsalar, D.; Garello, R.; Montorsi, G.; Pollara, F.

    1998-01-01

    Serial concatenation of an outer binary convolutional code with an inner TCM code over a multidimensional Euclidean constellation through an interleaver, allows to extend the extremely good performance of turbo codes to the case of high spectral efficiency.

  2. New ARQ protocols using concatenated codes

    NASA Astrophysics Data System (ADS)

    Benelli, Giuliano

    1993-07-01

    Two automatic-repeat-request (ARQ) protocols using a concatenated coding scheme are described. The structure, introduced in a codeword of a concatenated coding scheme, is used to improve the performance of ARQ protocols, especially for high error rates in the communication channel. The performance of the scheme described herein is derived through theoretical analysis. The results show that the proposed schemes outperform other similar ARQ protocols.

  3. An exploratory study on the driving method of speech synthesis based on the human eye reading imaging data

    NASA Astrophysics Data System (ADS)

    Gao, Pei-pei; Liu, Feng

    2016-10-01

    With the development of information technology and artificial intelligence, speech synthesis plays a significant role in the fields of Human-Computer Interaction Techniques. However, the main problem of current speech synthesis techniques is lacking of naturalness and expressiveness so that it is not yet close to the standard of natural language. Another problem is that the human-computer interaction based on the speech synthesis is too monotonous to realize mechanism of user subjective drive. This thesis introduces the historical development of speech synthesis and summarizes the general process of this technique. It is pointed out that prosody generation module is an important part in the process of speech synthesis. On the basis of further research, using eye activity rules when reading to control and drive prosody generation was introduced as a new human-computer interaction method to enrich the synthetic form. In this article, the present situation of speech synthesis technology is reviewed in detail. Based on the premise of eye gaze data extraction, using eye movement signal in real-time driving, a speech synthesis method which can express the real speech rhythm of the speaker is proposed. That is, when reader is watching corpora with its eyes in silent reading, capture the reading information such as the eye gaze duration per prosodic unit, and establish a hierarchical prosodic pattern of duration model to determine the duration parameters of synthesized speech. At last, after the analysis, the feasibility of the above method is verified.

  4. Coding Method of LSP Residual Signals Using Wavelets for Speech Synthesis

    NASA Astrophysics Data System (ADS)

    Shimizu, Tadaaki; Kimoto, Masaya; Yoshimura, Hiroki; Isu, Naoki; Sugata, Kazuhiro

    This paper presents a method to use wavelet analysis for speech coding and synthesis by rule. It is a coding system where LSP residual signal is transformed into wavelet coefficients. As wavelet analysis is implemented effectively by filter banks, our method is featured to require less computation than multipulse coding and others where complicated prediction procedures are essential. To achieve good quality speech at low bit rates, we verified to allocate the different bits onto the wavelet coefficients, with more bits in lower frequencies, and less in higher. The synthesized speech by Haar wavelet with 16.538kbits/s has nearly same perceptual quality with 6 bits μlog PCM (66.15kbits/s). We are convinced that coding method of LSP residual signals using wavelet analysis is an effective approach to synthesize speech.

  5. New Bandwidth Efficient Parallel Concatenated Coding Schemes

    NASA Technical Reports Server (NTRS)

    Denedetto, S.; Divsalar, D.; Montorsi, G.; Pollara, F.

    1996-01-01

    We propose a new solution to parallel concatenation of trellis codes with multilevel amplitude/phase modulations and a suitable iterative decoding structure. Examples are given for throughputs 2 bits/sec/Hz with 8PSK and 16QAM signal constellations.

  6. Application of speech recognition and synthesis in the general aviation cockpit

    NASA Technical Reports Server (NTRS)

    North, R. A.; Mountford, S. J.; Bergeron, H.

    1984-01-01

    Interactive speech recognition/synthesis technology is assessed as a method for the aleviation of single-pilot IFR flight workloads. Attention was given during this series of evaluations to the conditions typical of general aviation twin-engine aircrft cockpits, covering several commonly encountered IFR flight condition scenarios. The most beneficial speech command tasks are noted to be in the data retrieval domain, which would allow the pilot access to uplinked data, checklists, and performance charts. Data entry tasks also appear to benefit from this technology.

  7. Iterative Decoding of Concatenated Codes: A Tutorial

    NASA Astrophysics Data System (ADS)

    Regalia, Phillip A.

    2005-12-01

    The turbo decoding algorithm of a decade ago constituted a milestone in error-correction coding for digital communications, and has inspired extensions to generalized receiver topologies, including turbo equalization, turbo synchronization, and turbo CDMA, among others. Despite an accrued understanding of iterative decoding over the years, the "turbo principle" remains elusive to master analytically, thereby inciting interest from researchers outside the communications domain. In this spirit, we develop a tutorial presentation of iterative decoding for parallel and serial concatenated codes, in terms hopefully accessible to a broader audience. We motivate iterative decoding as a computationally tractable attempt to approach maximum-likelihood decoding, and characterize fixed points in terms of a "consensus" property between constituent decoders. We review how the decoding algorithm for both parallel and serial concatenated codes coincides with an alternating projection algorithm, which allows one to identify conditions under which the algorithm indeed converges to a maximum-likelihood solution, in terms of particular likelihood functions factoring into the product of their marginals. The presentation emphasizes a common framework applicable to both parallel and serial concatenated codes.

  8. Overhead analysis of universal concatenated quantum codes

    NASA Astrophysics Data System (ADS)

    Chamberland, Christopher; Jochym-O'Connor, Tomas; Laflamme, Raymond

    2017-02-01

    We analyze the resource overhead of recently proposed methods for universal fault-tolerant quantum computation using concatenated codes. Namely, we examine the concatenation of the 7-qubit Steane code with the 15-qubit Reed-Muller code, which allows for the construction of the 49- and 105-qubit codes that do not require the need for magic state distillation for universality. We compute a lower bound for the adversarial noise threshold of the 105-qubit code and find it to be 8.33 ×10-6. We obtain a depolarizing noise threshold for the 49-qubit code of 9.69 ×10-4 which is competitive with the 105-qubit threshold result of 1.28 ×10-3 . We then provide lower bounds on the resource requirements of the 49- and 105-qubit codes and compare them with the surface code implementation of a logical T gate using magic state distillation. For the sampled input error rates and noise model, we find that the surface code achieves a smaller overhead compared to our concatenated schemes.

  9. Concatenated Coding Using Trellis-Coded Modulation

    NASA Technical Reports Server (NTRS)

    Thompson, Michael W.

    1997-01-01

    In the late seventies and early eighties a technique known as Trellis Coded Modulation (TCM) was developed for providing spectrally efficient error correction coding. Instead of adding redundant information in the form of parity bits, redundancy is added at the modulation stage thereby increasing bandwidth efficiency. A digital communications system can be designed to use bandwidth-efficient multilevel/phase modulation such as Amplitude Shift Keying (ASK), Phase Shift Keying (PSK), Differential Phase Shift Keying (DPSK) or Quadrature Amplitude Modulation (QAM). Performance gain can be achieved by increasing the number of signals over the corresponding uncoded system to compensate for the redundancy introduced by the code. A considerable amount of research and development has been devoted toward developing good TCM codes for severely bandlimited applications. More recently, the use of TCM for satellite and deep space communications applications has received increased attention. This report describes the general approach of using a concatenated coding scheme that features TCM and RS coding. Results have indicated that substantial (6-10 dB) performance gains can be achieved with this approach with comparatively little bandwidth expansion. Since all of the bandwidth expansion is due to the RS code we see that TCM based concatenated coding results in roughly 10-50% bandwidth expansion compared to 70-150% expansion for similar concatenated scheme which use convolution code. We stress that combined coding and modulation optimization is important for achieving performance gains while maintaining spectral efficiency.

  10. Integration of a laser system with a speech synthesis apparatus: a feasibility study

    NASA Astrophysics Data System (ADS)

    Daurelio, Giuseppe; Ludovico, Antonio D.; Giorleo, G.; Esposito, U.

    1993-05-01

    This work concerns a study on the integration of a laser system with a speech synthesis facility. The speech synthesis system uses a random access memory (RAM) and electrical contacts (NO and/or NC), controlled by an electronic circuit provided with a microprocessor, in order to take off the `spoken' information corresponding to the actual failure. Hence the laser system is able to `speak' to the operator and to keep him informed on the process conditions giving him step by step simple ON and OFF instructions (replacing the operator instruction manual) and keeping him informed about the actual state of the technological plants and giving him `the spoken messages' for maintenance with scheduled expiry (replacing the maintenance instruction manual). In other words, the future `speaking laser system' will be able to make a complete auto-diagnosis and to report in real-time the results to the operator.

  11. Synthesis of Speaker Facial Movement to Match Selected Speech Sequences

    NASA Technical Reports Server (NTRS)

    Scott, K. C.; Kagels, D. S.; Watson, S. H.; Rom, H.; Wright, J. R.; Lee, M.; Hussey, K. J.

    1994-01-01

    A system is described which allows for the synthesis of a video sequence of a realistic-appearing talking human head. A phonic based approach is used to describe facial motion; image processing rather than physical modeling techniques are used to create video frames.

  12. Synthesis of Speaker Facial Movement to Match Selected Speech Sequences

    NASA Technical Reports Server (NTRS)

    Scott, K. C.; Kagels, D. S.; Watson, S. H.; Rom, H.; Wright, J. R.; Lee, M.; Hussey, K. J.

    1994-01-01

    A system is described which allows for the synthesis of a video sequence of a realistic-appearing talking human head. A phonic based approach is used to describe facial motion; image processing rather than physical modeling techniques are used to create video frames.

  13. A concatenated coding scheme for error control

    NASA Technical Reports Server (NTRS)

    Lin, S.

    1985-01-01

    A concatenated coding scheme for error contol in data communications was analyzed. The inner code is used for both error correction and detection, however the outer code is used only for error detection. A retransmission is requested if either the inner code decoder fails to make a successful decoding or the outer code decoder detects the presence of errors after the inner code decoding. Probability of undetected error of the proposed scheme is derived. An efficient method for computing this probability is presented. Throughout efficiency of the proposed error control scheme incorporated with a selective repeat ARQ retransmission strategy is analyzed.

  14. HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation

    NASA Astrophysics Data System (ADS)

    Nose, Takashi; Tachibana, Makoto; Kobayashi, Takao

    This paper presents methods for controlling the intensity of emotional expressions and speaking styles of an arbitrary speaker's synthetic speech by using a small amount of his/her speech data in HMM-based speech synthesis. Model adaptation approaches are introduced into the style control technique based on the multiple-regression hidden semi-Markov model (MRHSMM). Two different approaches are proposed for training a target speaker's MRHSMMs. The first one is MRHSMM-based model adaptation in which the pretrained MRHSMM is adapted to the target speaker's model. For this purpose, we formulate the MLLR adaptation algorithm for the MRHSMM. The second method utilizes simultaneous adaptation of speaker and style from an average voice model to obtain the target speaker's style-dependent HSMMs which are used for the initialization of the MRHSMM. From the result of subjective evaluation using adaptation data of 50 sentences of each style, we show that the proposed methods outperform the conventional speaker-dependent model training when using the same size of speech data of the target speaker.

  15. Stereotaxy, navigation and the temporal concatenation.

    PubMed

    Apuzzo, M L; Chen, J C

    1999-01-01

    Nautical and cerebral navigation share similar elements of functional need and similar developmental pathways. The need for orientation necessitates the development of appropriate concepts, and such concepts are dependent on technology for practical realization. Occasionally, a concept precedes technology in time and requires periods of delay for appropriate development. A temporal concatenation exists where time allows the additive as need, concept and technology ultimately provide an endpoint of elegant solution. Nautical navigation has proceeded through periods of dead reckoning and celestial navigation to satellite orientation with associated refinements of instrumentation and charts for guidance. Cerebral navigation has progressed from craniometric orientation and burr hole mounted guidance systems to simple rectolinear and arc-centered devices based on radiographs to guidance by complex anatomical and functional maps provided as an amalgam of modern imaging modes. These maps are now augmented by complex frame and frameless systems which allow not only precise orientation, but also point and volumetric action. These complex technical modalities required and developed in part from elements of maritime navigation that have been translated to cerebral navigation in a temporal concatenation.

  16. A concatenational graph evolution aging model.

    PubMed

    Suo, Jinli; Chen, Xilin; Shan, Shiguang; Gao, Wen; Dai, Qionghai

    2012-11-01

    Modeling the long-term face aging process is of great importance for face recognition and animation, but there is a lack of sufficient long-term face aging sequences for model learning. To address this problem, we propose a CONcatenational GRaph Evolution (CONGRE) aging model, which adopts decomposition strategy in both spatial and temporal aspects to learn long-term aging patterns from partially dense aging databases. In spatial aspect, we build a graphical face representation, in which a human face is decomposed into mutually interrelated subregions under anatomical guidance. In temporal aspect, the long-term evolution of the above graphical representation is then modeled by connecting sequential short-term patterns following the Markov property of aging process under smoothness constraints between neighboring short-term patterns and consistency constraints among subregions. The proposed model also considers the diversity of face aging by proposing probabilistic concatenation strategy between short-term patterns and applying scholastic sampling in aging prediction. In experiments, the aging prediction results generated by the learned aging models are evaluated both subjectively and objectively to validate the proposed model.

  17. An Interactive Concatenated Turbo Coding System

    NASA Technical Reports Server (NTRS)

    Liu, Ye; Tang, Heng; Lin, Shu; Fossorier, Marc

    1999-01-01

    This paper presents a concatenated turbo coding system in which a Reed-Solomon outer code is concatenated with a binary turbo inner code. In the proposed system, the outer code decoder and the inner turbo code decoder interact to achieve both good bit error and frame error performances. The outer code decoder helps the inner turbo code decoder to terminate its decoding iteration while the inner turbo code decoder provides soft-output information to the outer code decoder to carry out a reliability-based soft- decision decoding. In the case that the outer code decoding fails, the outer code decoder instructs the inner code decoder to continue its decoding iterations until the outer code decoding is successful or a preset maximum number of decoding iterations is reached. This interaction between outer and inner code decoders reduces decoding delay. Also presented in the paper are an effective criterion for stopping the iteration process of the inner code decoder and a new reliability-based decoding algorithm for nonbinary codes.

  18. Soft context clustering for F0 modeling in HMM-based speech synthesis

    NASA Astrophysics Data System (ADS)

    Khorram, Soheil; Sameti, Hossein; King, Simon

    2015-12-01

    This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure

  19. Performance of concatenated Reed-Solomon trellis-coded modulation over Rician fading channels

    NASA Technical Reports Server (NTRS)

    Moher, Michael L.; Lodge, John H.

    1990-01-01

    A concatenated coding scheme for providing very reliable data over mobile-satellite channels at power levels similar to those used for vocoded speech is described. The outer code is a shorter Reed-Solomon code which provides error detection as well as error correction capabilities. The inner code is a 1-D 8-state trellis code applied independently to both the inphase and quadrature channels. To achieve the full error correction potential of this inner code, the code symbols are multiplexed with a pilot sequence which is used to provide dynamic channel estimation and coherent detection. The implementation structure of this scheme is discussed and its performance is estimated.

  20. A concatenated coding scheme for error control

    NASA Technical Reports Server (NTRS)

    Kasami, T.; Fujiwara, T.; Lin, S.

    1986-01-01

    In this paper, a concatenated coding scheme for error control in data communications is presented and analyzed. In this scheme, the inner code is used for both error correction and detection; however, the outer code is used only for error detection. A retransmission is requested if either the inner code decoder fails to make a successful decoding or the outer code decoder detects the presence of errors after the inner code decoding. Probability of undetected error (or decoding error) of the proposed scheme is derived. An efficient method for computing this probability is presented. Throughput efficiency of the proposed error control scheme incorporated with a selective-repeat ARQ retransmission strategy is also analyzed. Three specific examples are presented. One of the examples is proposed for error control in the NASA Telecommand System.

  1. A concatenated coding scheme for error control

    NASA Technical Reports Server (NTRS)

    Kasami, T.; Fujiwara, T.; Lin, S.

    1986-01-01

    In this paper, a concatenated coding scheme for error control in data communications is presented and analyzed. In this scheme, the inner code is used for both error correction and detection; however, the outer code is used only for error detection. A retransmission is requested if either the inner code decoder fails to make a successful decoding or the outer code decoder detects the presence of errors after the inner code decoding. Probability of undetected error (or decoding error) of the proposed scheme is derived. An efficient method for computing this probability is presented. Throughput efficiency of the proposed error control scheme incorporated with a selective-repeat ARQ retransmission strategy is also analyzed. Three specific examples are presented. One of the examples is proposed for error control in the NASA Telecommand System.

  2. Concatenated coding in the presence of dephasing

    NASA Astrophysics Data System (ADS)

    Gourlay, Iain; Snowdon, John F.

    2000-08-01

    We investigate the use of concatenated coding to protect against dephasing in the absence of other types of error in order to carry out large quantum computations. This analysis is based on a well-known three-bit quantum code. Fault tolerant methods for carrying out gate operations, ancilla preparation, and syndrome identification are discussed and the maximum (or threshold) error rate which can be tolerated (if quantum coherence is to be maintained for arbitrarily long computations) is estimated. The methods for performing fault tolerant gate operations are compared to the methods appropriate for the seven-bit code and it is concluded that the three-bit code is not likely to be useful for large-scale quantum computation.

  3. The Compensatory Effectiveness of Optical Character Recognition/Speech Synthesis on Reading Comprehension of Postsecondary Students with Learning Disabilities.

    ERIC Educational Resources Information Center

    Higgins, Eleanor L.; Raskind, Marshall H.

    1997-01-01

    Thirty-seven college students with learning disabilities were given a reading comprehension task under the following conditions: (1) using an optical character recognition/speech synthesis system; (2) having the text read aloud by a human reader; or (3) reading silently without assistance. Findings indicated that the greater the disability, the…

  4. A concatenated coding scheme for error control

    NASA Technical Reports Server (NTRS)

    Lin, S.

    1985-01-01

    A concatenated coding scheme for error control in data communications is analyzed. The inner code is used for both error correction and detection, however the outer code is used only for error detection. A retransmission is requested if the outer code detects the presence of errors after the inner code decoding. The probability of undetected error of the above error control scheme is derived and upper bounded. Two specific exmaples are analyzed. In the first example, the inner code is a distance-4 shortened Hamming code with generator polynomial (X+1)(X(6)+X+1) = X(7)+X(6)+X(2)+1 and the outer code is a distance-4 shortened Hamming code with generator polynomial (X+1)X(15+X(14)+X(13)+X(12)+X(4)+X(3)+X(2)+X+1) = X(16)+X(12)+X(5)+1 which is the X.25 standard for packet-switched data network. This example is proposed for error control on NASA telecommand links. In the second example, the inner code is the same as that in the first example but the outer code is a shortened Reed-Solomon code with symbols from GF(2(8)) and generator polynomial (X+1)(X+alpha) where alpha is a primitive element in GF(z(8)).

  5. Concatenated coding for low date rate space communications.

    NASA Technical Reports Server (NTRS)

    Chen, C. H.

    1972-01-01

    In deep space communications with distant planets, the data rate as well as the operating SNR may be very low. To maintain the error rate also at a very low level, it is necessary to use a sophisticated coding system (longer code) without excessive decoding complexity. The concatenated coding has been shown to meet such requirements in that the error rate decreases exponentially with the overall length of the code while the decoder complexity increases only algebraically. Three methods of concatenating an inner code with an outer code are considered. Performance comparison of the three concatenated codes is made.

  6. Concatenated coding for low date rate space communications.

    NASA Technical Reports Server (NTRS)

    Chen, C. H.

    1972-01-01

    In deep space communications with distant planets, the data rate as well as the operating SNR may be very low. To maintain the error rate also at a very low level, it is necessary to use a sophisticated coding system (longer code) without excessive decoding complexity. The concatenated coding has been shown to meet such requirements in that the error rate decreases exponentially with the overall length of the code while the decoder complexity increases only algebraically. Three methods of concatenating an inner code with an outer code are considered. Performance comparison of the three concatenated codes is made.

  7. (abstract) Synthesis of Speaker Facial Movements to Match Selected Speech Sequences

    NASA Technical Reports Server (NTRS)

    Scott, Kenneth C.

    1994-01-01

    We are developing a system for synthesizing image sequences the simulate the facial motion of a speaker. To perform this synthesis, we are pursuing two major areas of effort. We are developing the necessary computer graphics technology to synthesize a realistic image sequence of a person speaking selected speech sequences. Next, we are developing a model that expresses the relation between spoken phonemes and face/mouth shape. A subject is video taped speaking an arbitrary text that contains expression of the full list of desired database phonemes. The subject is video taped from the front speaking normally, recording both audio and video detail simultaneously. Using the audio track, we identify the specific video frames on the tape relating to each spoken phoneme. From this range we digitize the video frame which represents the extreme of mouth motion/shape. Thus, we construct a database of images of face/mouth shape related to spoken phonemes. A selected audio speech sequence is recorded which is the basis for synthesizing a matching video sequence; the speaker need not be the same as used for constructing the database. The audio sequence is analyzed to determine the spoken phoneme sequence and the relative timing of the enunciation of those phonemes. Synthesizing an image sequence corresponding to the spoken phoneme sequence is accomplished using a graphics technique known as morphing. Image sequence keyframes necessary for this processing are based on the spoken phoneme sequence and timing. We have been successful in synthesizing the facial motion of a native English speaker for a small set of arbitrary speech segments. Our future work will focus on advancement of the face shape/phoneme model and independent control of facial features.

  8. (abstract) Synthesis of Speaker Facial Movements to Match Selected Speech Sequences

    NASA Technical Reports Server (NTRS)

    Scott, Kenneth C.

    1994-01-01

    We are developing a system for synthesizing image sequences the simulate the facial motion of a speaker. To perform this synthesis, we are pursuing two major areas of effort. We are developing the necessary computer graphics technology to synthesize a realistic image sequence of a person speaking selected speech sequences. Next, we are developing a model that expresses the relation between spoken phonemes and face/mouth shape. A subject is video taped speaking an arbitrary text that contains expression of the full list of desired database phonemes. The subject is video taped from the front speaking normally, recording both audio and video detail simultaneously. Using the audio track, we identify the specific video frames on the tape relating to each spoken phoneme. From this range we digitize the video frame which represents the extreme of mouth motion/shape. Thus, we construct a database of images of face/mouth shape related to spoken phonemes. A selected audio speech sequence is recorded which is the basis for synthesizing a matching video sequence; the speaker need not be the same as used for constructing the database. The audio sequence is analyzed to determine the spoken phoneme sequence and the relative timing of the enunciation of those phonemes. Synthesizing an image sequence corresponding to the spoken phoneme sequence is accomplished using a graphics technique known as morphing. Image sequence keyframes necessary for this processing are based on the spoken phoneme sequence and timing. We have been successful in synthesizing the facial motion of a native English speaker for a small set of arbitrary speech segments. Our future work will focus on advancement of the face shape/phoneme model and independent control of facial features.

  9. Performance Bounds on Two Concatenated, Interleaved Codes

    NASA Technical Reports Server (NTRS)

    Moision, Bruce; Dolinar, Samuel

    2010-01-01

    A method has been developed of computing bounds on the performance of a code comprised of two linear binary codes generated by two encoders serially concatenated through an interleaver. Originally intended for use in evaluating the performances of some codes proposed for deep-space communication links, the method can also be used in evaluating the performances of short-block-length codes in other applications. The method applies, more specifically, to a communication system in which following processes take place: At the transmitter, the original binary information that one seeks to transmit is first processed by an encoder into an outer code (Co) characterized by, among other things, a pair of numbers (n,k), where n (n > k)is the total number of code bits associated with k information bits and n k bits are used for correcting or at least detecting errors. Next, the outer code is processed through either a block or a convolutional interleaver. In the block interleaver, the words of the outer code are processed in blocks of I words. In the convolutional interleaver, the interleaving operation is performed bit-wise in N rows with delays that are multiples of B bits. The output of the interleaver is processed through a second encoder to obtain an inner code (Ci) characterized by (ni,ki). The output of the inner code is transmitted over an additive-white-Gaussian- noise channel characterized by a symbol signal-to-noise ratio (SNR) Es/No and a bit SNR Eb/No. At the receiver, an inner decoder generates estimates of bits. Depending on whether a block or a convolutional interleaver is used at the transmitter, the sequence of estimated bits is processed through a block or a convolutional de-interleaver, respectively, to obtain estimates of code words. Then the estimates of the code words are processed through an outer decoder, which generates estimates of the original information along with flags indicating which estimates are presumed to be correct and which are found to

  10. Advancements in text-to-speech technology and implications for AAC applications

    NASA Astrophysics Data System (ADS)

    Syrdal, Ann K.

    2003-10-01

    Intelligibility was the initial focus in text-to-speech (TTS) research, since it is clearly a necessary condition for the application of the technology. Sufficiently high intelligibility (approximating human speech) has been achieved in the last decade by the better formant-based and concatenative TTS systems. This led to commercially available TTS systems for highly motivated users, particularly the blind and vocally impaired. Some unnatural qualities of TTS were exploited by these users, such as very fast speaking rates and altered pitch ranges for flagging relevant information. Recently, the focus in TTS research has turned to improving naturalness, so that synthetic speech sounds more human and less robotic. Unit selection approaches to concatenative synthesis have dramatically improved TTS quality, although at the cost of larger and more complex systems. This advancement in naturalness has made TTS technology more acceptable to the general public. The vocally impaired appreciate a more natural voice with which to represent themselves when communicating with others. Unit selection TTS does not achieve such high speaking rates as the earlier TTS systems, however, which is a disadvantage to some AAC device users. An important new research emphasis is to improve and increase the range of emotional expressiveness of TTS.

  11. Hardware Implementation of Serially Concatenated PPM Decoder

    NASA Technical Reports Server (NTRS)

    Moision, Bruce; Hamkins, Jon; Barsoum, Maged; Cheng, Michael; Nakashima, Michael

    2009-01-01

    A prototype decoder for a serially concatenated pulse position modulation (SCPPM) code has been implemented in a field-programmable gate array (FPGA). At the time of this reporting, this is the first known hardware SCPPM decoder. The SCPPM coding scheme, conceived for free-space optical communications with both deep-space and terrestrial applications in mind, is an improvement of several dB over the conventional Reed-Solomon PPM scheme. The design of the FPGA SCPPM decoder is based on a turbo decoding algorithm that requires relatively low computational complexity while delivering error-rate performance within approximately 1 dB of channel capacity. The SCPPM encoder consists of an outer convolutional encoder, an interleaver, an accumulator, and an inner modulation encoder (more precisely, a mapping of bits to PPM symbols). Each code is describable by a trellis (a finite directed graph). The SCPPM decoder consists of an inner soft-in-soft-out (SISO) module, a de-interleaver, an outer SISO module, and an interleaver connected in a loop (see figure). Each SISO module applies the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm to compute a-posteriori bit log-likelihood ratios (LLRs) from apriori LLRs by traversing the code trellis in forward and backward directions. The SISO modules iteratively refine the LLRs by passing the estimates between one another much like the working of a turbine engine. Extrinsic information (the difference between the a-posteriori and a-priori LLRs) is exchanged rather than the a-posteriori LLRs to minimize undesired feedback. All computations are performed in the logarithmic domain, wherein multiplications are translated into additions, thereby reducing complexity and sensitivity to fixed-point implementation roundoff errors. To lower the required memory for storing channel likelihood data and the amounts of data transfer between the decoder and the receiver, one can discard the majority of channel likelihoods, using only the remainder in

  12. Protein knotting through concatenation significantly reduces folding stability

    PubMed Central

    Hsu, Shang-Te Danny

    2016-01-01

    Concatenation by covalent linkage of two protomers of an intertwined all-helical HP0242 homodimer from Helicobacter pylori results in the first example of an engineered knotted protein. While concatenation does not affect the native structure according to X-ray crystallography, the folding kinetics is substantially slower compared to the parent homodimer. Using NMR hydrogen-deuterium exchange analysis, we showed here that concatenation destabilises significantly the knotted structure in solution, with some regions close to the covalent linkage being destabilised by as much as 5 kcal mol−1. Structural mapping of chemical shift perturbations induced by concatenation revealed a pattern that is similar to the effect induced by concentrated chaotrophic agent. Our results suggested that the design strategy of protein knotting by concatenation may be thermodynamically unfavourable due to covalent constrains imposed on the flexible fraying ends of the template structure, leading to rugged free energy landscape with increased propensity to form off-pathway folding intermediates. PMID:27982106

  13. Generality of the concatenated five-qubit code

    NASA Astrophysics Data System (ADS)

    Huang, Long; You, Bo; Wu, Xiaohua; Zhou, Tao

    2015-11-01

    In this work, a quantum error correction (QEC) procedure with the concatenated five-qubit code is used to construct a near-perfect effective qubit channel (with a error below 10-5) from arbitrary noise channels. The exact performance of the QEC is characterized by a Choi matrix, which can be obtained via a simple and explicit protocol. In a noise model with five free parameters, our numerical results indicate that the concatenated five-qubit code is general: To construct a near-perfect effective channel from the noise channels, the necessary size of the concatenated five-qubit code depends only on the entanglement fidelity of the initial noise channels.

  14. Performance of concatenated Reed-Solomon/Viterbi channel coding

    NASA Technical Reports Server (NTRS)

    Divsalar, D.; Yuen, J. H.

    1982-01-01

    The concatenated Reed-Solomon (RS)/Viterbi coding system is reviewed. The performance of the system is analyzed and results are derived with a new simple approach. A functional model for the input RS symbol error probability is presented. Based on this new functional model, we compute the performance of a concatenated system in terms of RS word error probability, output RS symbol error probability, bit error probability due to decoding failure, and bit error probability due to decoding error. Finally we analyze the effects of the noisy carrier reference and the slow fading on the system performance.

  15. Speech research directions

    SciTech Connect

    Atal, B.S.; Rabiner, L.R.

    1986-09-01

    This paper presents an overview of the current activities in speech research. The authors discuss the state of the art in speech coding, text-to-speech synthesis, speech recognition, and speaker recognition. In the speech coding area, current algorithms perform well at bit rates down to 9.6 kb/s, and the research is directed at bringing the rate for high-quality speech coding down to 2.4 kb/s. In text-to-speech synthesis, what we currently are able to produce is very intelligible but not yet completely natural. Current research aims at providing higher quality and intelligibility to the synthetic speech that these systems produce. Finally, today's systems for speech and speaker recognition provide excellent performance on limited tasks; i.e., limited vocabulary, modest syntax, small talker populations, constrained inputs, etc.

  16. The Neural Basis of Speech Parsing in Children and Adults

    ERIC Educational Resources Information Center

    McNealy, Kristin; Mazziotta, John C.; Dapretto, Mirella

    2010-01-01

    Word segmentation, detecting word boundaries in continuous speech, is a fundamental aspect of language learning that can occur solely by the computation of statistical and speech cues. Fifty-four children underwent functional magnetic resonance imaging (fMRI) while listening to three streams of concatenated syllables that contained either high…

  17. The Neural Basis of Speech Parsing in Children and Adults

    ERIC Educational Resources Information Center

    McNealy, Kristin; Mazziotta, John C.; Dapretto, Mirella

    2010-01-01

    Word segmentation, detecting word boundaries in continuous speech, is a fundamental aspect of language learning that can occur solely by the computation of statistical and speech cues. Fifty-four children underwent functional magnetic resonance imaging (fMRI) while listening to three streams of concatenated syllables that contained either high…

  18. Prosody Production and Perception with Conversational Speech

    ERIC Educational Resources Information Center

    Mo, Yoonsook

    2010-01-01

    Speech utterances are more than the linear concatenation of individual phonemes or words. They are organized by prosodic structures comprising phonological units of different sizes (e.g., syllable, foot, word, and phrase) and the prominence relations among them. As the linguistic structure of spoken languages, prosody serves an important function…

  19. Prosody Production and Perception with Conversational Speech

    ERIC Educational Resources Information Center

    Mo, Yoonsook

    2010-01-01

    Speech utterances are more than the linear concatenation of individual phonemes or words. They are organized by prosodic structures comprising phonological units of different sizes (e.g., syllable, foot, word, and phrase) and the prominence relations among them. As the linguistic structure of spoken languages, prosody serves an important function…

  20. Speech processing using maximum likelihood continuity mapping

    DOEpatents

    Hogden, John E.

    2000-01-01

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  1. Speech processing using maximum likelihood continuity mapping

    SciTech Connect

    Hogden, J.E.

    2000-04-18

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  2. Reliability and throughput analysis of a concatenated coding scheme

    NASA Technical Reports Server (NTRS)

    Deng, Robert H.; Costello, Daniel J., Jr.

    1987-01-01

    The performance of a concatenated coding scheme for error control in ARQ systems is analyzed for both random-noise and burst-noise channels. In particular, the probability of undetected error and the system throughput are calculated. In this scheme, the inner code is used for both error correction and error detection, and the outer code is used for error detection only. Interleaving/deinterleaving is assumed within the outer code. A retransmission is requested if either the inner code or the outer code detects the presence of errors. Various coding examples are considered. The results show that concatenated coding can provide extremely high system reliability (i.e., low probability of undetected error) and high system throughput.

  3. A Wireless Brain-Machine Interface for Real-Time Speech Synthesis

    PubMed Central

    Guenther, Frank H.; Brumberg, Jonathan S.; Wright, E. Joseph; Nieto-Castanon, Alfonso; Tourville, Jason A.; Panko, Mikhail; Law, Robert; Siebert, Steven A.; Bartels, Jess L.; Andreasen, Dinal S.; Ehirim, Princewill; Mao, Hui; Kennedy, Philip R.

    2009-01-01

    Background Brain-machine interfaces (BMIs) involving electrodes implanted into the human cerebral cortex have recently been developed in an attempt to restore function to profoundly paralyzed individuals. Current BMIs for restoring communication can provide important capabilities via a typing process, but unfortunately they are only capable of slow communication rates. In the current study we use a novel approach to speech restoration in which we decode continuous auditory parameters for a real-time speech synthesizer from neuronal activity in motor cortex during attempted speech. Methodology/Principal Findings Neural signals recorded by a Neurotrophic Electrode implanted in a speech-related region of the left precentral gyrus of a human volunteer suffering from locked-in syndrome, characterized by near-total paralysis with spared cognition, were transmitted wirelessly across the scalp and used to drive a speech synthesizer. A Kalman filter-based decoder translated the neural signals generated during attempted speech into continuous parameters for controlling a synthesizer that provided immediate (within 50 ms) auditory feedback of the decoded sound. Accuracy of the volunteer's vowel productions with the synthesizer improved quickly with practice, with a 25% improvement in average hit rate (from 45% to 70%) and 46% decrease in average endpoint error from the first to the last block of a three-vowel task. Conclusions/Significance Our results support the feasibility of neural prostheses that may have the potential to provide near-conversational synthetic speech output for individuals with severely impaired speech motor control. They also provide an initial glimpse into the functional properties of neurons in speech motor cortical areas. PMID:20011034

  4. Permanence analysis of a concatenated coding scheme for error control

    NASA Technical Reports Server (NTRS)

    Costello, D. J., Jr.; Lin, S.; Kasami, T.

    1983-01-01

    A concatenated coding scheme for error control in data communications is analyzed. In this scheme, the inner code is used for both error correction and detection, however, the outer code is used only for error detection. A retransmission is requested if the outer code detects the presence of errors after the inner code decoding. Probability of undetected error is derived and bounded. A particular example, proposed for the planetary program, is analyzed.

  5. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  6. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  7. Bounds on Block Error Probability for Multilevel Concatenated Codes

    NASA Technical Reports Server (NTRS)

    Lin, Shu; Moorthy, Hari T.; Stojanovic, Diana

    1996-01-01

    Maximum likelihood decoding of long block codes is not feasable due to large complexity. Some classes of codes are shown to be decomposable into multilevel concatenated codes (MLCC). For these codes, multistage decoding provides good trade-off between performance and complexity. In this paper, we derive an upper bound on the probability of block error for MLCC. We use this bound to evaluate difference in performance for different decompositions of some codes. Examples given show that a significant reduction in complexity can be achieved when increasing number of stages of decoding. Resulting performance degradation varies for different decompositions. A guideline is given for finding good m-level decompositions.

  8. The Effects on Children's Writing of Adding Speech Synthesis to a Word Processor.

    ERIC Educational Resources Information Center

    Borgh, Karin; Dickson, W. Patrick

    1992-01-01

    Synthesized speech was added to a word processor used by children in grades two and five. Each student wrote four stories on a microcomputer and received spoken feedback on two of them. It was found that students wrote longer stories, edited more, and enjoyed writing more with the spoken feedback. (24 references) (LAE)

  9. A concatenated coded modulation scheme for error control

    NASA Technical Reports Server (NTRS)

    Kasami, Tadao; Takata, Toyoo; Fujiwara, Toru; Lin, Shu

    1990-01-01

    A concatenated coded modulation scheme for error control in data communications is presented. The scheme is achieved by concatenating a Reed-Solomon outer code and a bandwidth efficient block inner code for M-ary PSK modulation. Error performance of the scheme is analyzed for an AWGN channel. It is shown that extremely high reliability can be attained by using a simple M-ary PSK modulation inner code and relatively powerful Reed-Solomon outer code. Furthermore, if an inner code of high effective rate is used, the bandwidth expansion required by the scheme due to coding will be greatly reduced. The proposed scheme is particularly effective for high speed satellite communications for large file transfer where high reliability is required. Also presented is a simple method for constructing block codes for M-ary PSK modulation. Some short M-ary PSK codes with good minimum squared Euclidean distance are constructed. These codes have trellis structure and hence can be decoded with a soft-decision Viterbi decoding algorithm.

  10. Concatenation of 'alert' and 'identity' segments in dingoes' alarm calls.

    PubMed

    Déaux, Eloïse C; Allen, Andrew P; Clarke, Jennifer A; Charrier, Isabelle

    2016-07-27

    Multicomponent signals can be formed by the uninterrupted concatenation of multiple call types. One such signal is found in dingoes, Canis familiaris dingo. This stereotyped, multicomponent 'bark-howl' vocalisation is formed by the concatenation of a noisy bark segment and a tonal howl segment. Both segments are structurally similar to bark and howl vocalisations produced independently in other contexts (e.g. intra- and inter-pack communication). Bark-howls are mainly uttered in response to human presence and were hypothesized to serve as alarm calls. We investigated the function of bark-howls and the respective roles of the bark and howl segments. We found that dingoes could discriminate between familiar and unfamiliar howl segments, after having only heard familiar howl vocalisations (i.e. different calls). We propose that howl segments could function as 'identity signals' and allow receivers to modulate their responses according to the caller's characteristics. The bark segment increased receivers' attention levels, providing support for earlier observational claims that barks have an 'alerting' function. Lastly, dingoes were more likely to display vigilance behaviours upon hearing bark-howl vocalisations, lending support to the alarm function hypothesis. Canid vocalisations, such as the dingo bark-howl, may provide a model system to investigate the selective pressures shaping complex communication systems.

  11. A concatenated coded modulation scheme for error control

    NASA Technical Reports Server (NTRS)

    Kasami, Tadao; Lin, Shu

    1988-01-01

    A concatenated coded modulation scheme for error control in data communications is presented. The scheme is achieved by concatenating a Reed-Solomon outer code and a bandwidth efficient block inner code for M-ary PSK modulation. Error performance of the scheme is analyzed for an AWGN channel. It is shown that extremely high reliability can be attained by using a simple M-ary PSK modulation inner code and relatively powerful Reed-Solomon outer code. Furthermore, if an inner code of high effective rate is used, the bandwidth expansion required by the scheme due to coding will be greatly reduced. The proposed scheme is particularly effective for high speed satellite communication for large file transfer where high reliability is required. Also presented is a simple method for constructing block codes for M-ary PSK modulation. Some short M-ary PSK codes with good minimum squared Euclidean distance are constructed. These codes have trellis structure and hence can be decoded with a soft decision Viterbi decoding algorithm.

  12. A concatenated coded modulation scheme for error control (addition 2)

    NASA Technical Reports Server (NTRS)

    Lin, Shu

    1988-01-01

    A concatenated coded modulation scheme for error control in data communications is described. The scheme is achieved by concatenating a Reed-Solomon outer code and a bandwidth efficient block inner code for M-ary PSK modulation. Error performance of the scheme is analyzed for an AWGN channel. It is shown that extremely high reliability can be attained by using a simple M-ary PSK modulation inner code and a relatively powerful Reed-Solomon outer code. Furthermore, if an inner code of high effective rate is used, the bandwidth expansion required by the scheme due to coding will be greatly reduced. The proposed scheme is particularly effective for high-speed satellite communications for large file transfer where high reliability is required. This paper also presents a simple method for constructing block codes for M-ary PSK modulation. Some short M-ary PSK codes with good minimum squared Euclidean distance are constructed. These codes have trellis structure and hence can be decoded with a soft-decision Viterbi decoding algorithm. Furthermore, some of these codes are phase invariant under multiples of 45 deg rotation.

  13. Serial turbo trellis coded modulation using a serially concatenated coder

    NASA Technical Reports Server (NTRS)

    Divsalar, Dariush (Inventor); Dolinar, Samuel J. (Inventor); Pollara, Fabrizio (Inventor)

    2010-01-01

    Serial concatenated trellis coded modulation (SCTCM) includes an outer coder, an interleaver, a recursive inner coder and a mapping element. The outer coder receives data to be coded and produces outer coded data. The interleaver permutes the outer coded data to produce interleaved data. The recursive inner coder codes the interleaved data to produce inner coded data. The mapping element maps the inner coded data to a symbol. The recursive inner coder has a structure which facilitates iterative decoding of the symbols at a decoder system. The recursive inner coder and the mapping element are selected to maximize the effective free Euclidean distance of a trellis coded modulator formed from the recursive inner coder and the mapping element. The decoder system includes a demodulation unit, an inner SISO (soft-input soft-output) decoder, a deinterleaver, an outer SISO decoder, and an interleaver.

  14. Serial turbo trellis coded modulation using a serially concatenated coder

    NASA Technical Reports Server (NTRS)

    Divsalar, Dariush (Inventor); Dolinar, Samuel J. (Inventor); Pollara, Fabrizio (Inventor)

    2011-01-01

    Serial concatenated trellis coded modulation (SCTCM) includes an outer coder, an interleaver, a recursive inner coder and a mapping element. The outer coder receives data to be coded and produces outer coded data. The interleaver permutes the outer coded data to produce interleaved data. The recursive inner coder codes the interleaved data to produce inner coded data. The mapping element maps the inner coded data to a symbol. The recursive inner coder has a structure which facilitates iterative decoding of the symbols at a decoder system. The recursive inner coder and the mapping element are selected to maximize the effective free Euclidean distance of a trellis coded modulator formed from the recursive inner coder and the mapping element. The decoder system includes a demodulation unit, an inner SISO (soft-input soft-output) decoder, a deinterleaver, an outer SISO decoder, and an interleaver.

  15. Multilevel Concatenated Block Modulation Codes for the Frequency Non-selective Rayleigh Fading Channel

    NASA Technical Reports Server (NTRS)

    Lin, Shu; Rhee, Dojun

    1996-01-01

    This paper is concerned with construction of multilevel concatenated block modulation codes using a multi-level concatenation scheme for the frequency non-selective Rayleigh fading channel. In the construction of multilevel concatenated modulation code, block modulation codes are used as the inner codes. Various types of codes (block or convolutional, binary or nonbinary) are being considered as the outer codes. In particular, we focus on the special case for which Reed-Solomon (RS) codes are used as the outer codes. For this special case, a systematic algebraic technique for constructing q-level concatenated block modulation codes is proposed. Codes have been constructed for certain specific values of q and compared with the single-level concatenated block modulation codes using the same inner codes. A multilevel closest coset decoding scheme for these codes is proposed.

  16. Digression and Value Concatenation to Enable Privacy-Preserving Regression

    PubMed Central

    Li, Xiao-Bai; Sarkar, Sumit

    2015-01-01

    Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals’ sensitive data. This problem, which we call a “regression attack,” has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called digression, which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis. PMID:26752802

  17. Construction of optimal resources for concatenated quantum protocols

    NASA Astrophysics Data System (ADS)

    Pirker, A.; Wallnöfer, J.; Briegel, H. J.; Dür, W.

    2017-06-01

    We consider the explicit construction of resource states for measurement-based quantum information processing. We concentrate on special-purpose resource states that are capable to perform a certain operation or task, where we consider unitary Clifford circuits as well as non-trace-preserving completely positive maps, more specifically probabilistic operations including Clifford operations and Pauli measurements. We concentrate on 1 →m and m →1 operations, i.e., operations that map one input qubit to m output qubits or vice versa. Examples of such operations include encoding and decoding in quantum error correction, entanglement purification, or entanglement swapping. We provide a general framework to construct optimal resource states for complex tasks that are combinations of these elementary building blocks. All resource states only contain input and output qubits, and are hence of minimal size. We obtain a stabilizer description of the resulting resource states, which we also translate into a circuit pattern to experimentally generate these states. In particular, we derive recurrence relations at the level of stabilizers as key analytical tool to generate explicit (graph) descriptions of families of resource states. This allows us to explicitly construct resource states for encoding, decoding, and syndrome readout for concatenated quantum error correction codes, code switchers, multiple rounds of entanglement purification, quantum repeaters, and combinations thereof (such as resource states for entanglement purification of encoded states).

  18. Polarization entanglement purification for concatenated Greenberger-Horne-Zeilinger state

    NASA Astrophysics Data System (ADS)

    Zhou, Lan; Sheng, Yu-Bo

    2017-10-01

    Entanglement purification plays a fundamental role in long-distance quantum communication. In the paper, we put forward the first polarization entanglement purification protocol (EPP) for one type of nonlocal logic-qubit entanglement, i.e., concatenated Greenberger-Horne-Zeilinger (C-GHZ) state, resorting to the photon-atom interaction in low-quality (Q) cavity. In contrast to existing EPPs, this protocol can purify the bit-flip error and phase-flip error in both physic and logic level. Instead of measuring the photons directly, this protocol only requires to measure the atom states to judge whether the protocol is successful. In this way, the purified logic entangled states can be preserved for further application. Moreover, it makes this EPP repeatable so as to obtain a higher fidelity of logic entangled states. As the logic-qubit entanglement utilizes the quantum error correction (QEC) codes, which has an inherent stability against noise and decoherence, this EPP combined with the QEC codes may provide a double protection for the entanglement from the channel noise and may have potential applications in long-distance quantum communication.

  19. Medical reliable network using concatenated channel codes through GSM network.

    PubMed

    Ahmed, Emtithal; Kohno, Ryuji

    2013-01-01

    Although the 4(th) generation (4G) of global mobile communication network, i.e. Long Term Evolution (LTE) coexisting with the 3(rd) generation (3G) has successfully started; the 2(nd) generation (2G), i.e. Global System for Mobile communication (GSM) still playing an important role in many developing countries. Without any other reliable network infrastructure, GSM can be applied for tele-monitoring applications, where high mobility and low cost are necessary. A core objective of this paper is to introduce the design of a more reliable and dependable Medical Network Channel Code system (MNCC) through GSM Network. MNCC design based on simple concatenated channel code, which is cascade of an inner code (GSM) and an extra outer code (Convolution Code) in order to protect medical data more robust against channel errors than other data using the existing GSM network. In this paper, the MNCC system will provide Bit Error Rate (BER) equivalent to the BER for medical tele monitoring of physiological signals, which is 10(-5) or less. The performance of the MNCC has been proven and investigated using computer simulations under different channels condition such as, Additive White Gaussian Noise (AWGN), Rayleigh noise and burst noise. Generally the MNCC system has been providing better performance as compared to GSM.

  20. Cyanuric acid hydrolase: evolutionary innovation by structural concatenation

    PubMed Central

    Peat, Thomas S; Balotra, Sahil; Wilding, Matthew; French, Nigel G; Briggs, Lyndall J; Panjikar, Santosh; Cowieson, Nathan; Newman, Janet; Scott, Colin

    2013-01-01

    The cyanuric acid hydrolase, AtzD, is the founding member of a newly identified family of ring-opening amidases. We report the first X-ray structure for this family, which is a novel fold (termed the ‘Toblerone’ fold) that likely evolved via the concatenation of monomers of the trimeric YjgF superfamily and the acquisition of a metal binding site. Structures of AtzD with bound substrate (cyanuric acid) and inhibitors (phosphate, barbituric acid and melamine), along with mutagenesis studies, allowed the identification of the active site. The AtzD monomer, active site and substrate all possess threefold rotational symmetry, to the extent that the active site possesses three potential Ser–Lys catalytic dyads. A single catalytic dyad (Ser85–Lys42) is hypothesized, based on biochemical evidence and crystallographic data. A plausible catalytic mechanism based on these observations is also presented. A comparison with a homology model of the related barbiturase, Bar, was used to infer the active-site residues responsible for substrate specificity, and the phylogeny of the 68 AtzD-like enzymes in the database were analysed in light of this structure–function relationship. PMID:23651355

  1. Campbell's monkeys concatenate vocalizations into context-specific call sequences

    PubMed Central

    Ouattara, Karim; Lemasson, Alban; Zuberbühler, Klaus

    2009-01-01

    Primate vocal behavior is often considered irrelevant in modeling human language evolution, mainly because of the caller's limited vocal control and apparent lack of intentional signaling. Here, we present the results of a long-term study on Campbell's monkeys, which has revealed an unrivaled degree of vocal complexity. Adult males produced six different loud call types, which they combined into various sequences in highly context-specific ways. We found stereotyped sequences that were strongly associated with cohesion and travel, falling trees, neighboring groups, nonpredatory animals, unspecific predatory threat, and specific predator classes. Within the responses to predators, we found that crowned eagles triggered four and leopards three different sequences, depending on how the caller learned about their presence. Callers followed a number of principles when concatenating sequences, such as nonrandom transition probabilities of call types, addition of specific calls into an existing sequence to form a different one, or recombination of two sequences to form a third one. We conclude that these primates have overcome some of the constraints of limited vocal control by combinatorial organization. As the different sequences were so tightly linked to specific external events, the Campbell's monkey call system may be the most complex example of ‘proto-syntax’ in animal communication known to date. PMID:20007377

  2. Hamming and Accumulator Codes Concatenated with MPSK or QAM

    NASA Technical Reports Server (NTRS)

    Divsalar, Dariush; Dolinar, Samuel

    2009-01-01

    In a proposed coding-and-modulation scheme, a high-rate binary data stream would be processed as follows: 1. The input bit stream would be demultiplexed into multiple bit streams. 2. The multiple bit streams would be processed simultaneously into a high-rate outer Hamming code that would comprise multiple short constituent Hamming codes a distinct constituent Hamming code for each stream. 3. The streams would be interleaved. The interleaver would have a block structure that would facilitate parallelization for high-speed decoding. 4. The interleaved streams would be further processed simultaneously into an inner two-state, rate-1 accumulator code that would comprise multiple constituent accumulator codes - a distinct accumulator code for each stream. 5. The resulting bit streams would be mapped into symbols to be transmitted by use of a higher-order modulation - for example, M-ary phase-shift keying (MPSK) or quadrature amplitude modulation (QAM). The novelty of the scheme lies in the concatenation of the multiple-constituent Hamming and accumulator codes and the corresponding parallel architectures of the encoder and decoder circuitry (see figure) needed to process the multiple bit streams simultaneously. As in the cases of other parallel-processing schemes, one advantage of this scheme is that the overall data rate could be much greater than the data rate of each encoder and decoder stream and, hence, the encoder and decoder could handle data at an overall rate beyond the capability of the individual encoder and decoder circuits.

  3. Optimal and efficient decoding of concatenated quantum block codes

    SciTech Connect

    Poulin, David

    2006-11-15

    We consider the problem of optimally decoding a quantum error correction code--that is, to find the optimal recovery procedure given the outcomes of partial ''check'' measurements on the system. In general, this problem is NP hard. However, we demonstrate that for concatenated block codes, the optimal decoding can be efficiently computed using a message-passing algorithm. We compare the performance of the message-passing algorithm to that of the widespread blockwise hard decoding technique. Our Monte Carlo results using the five-qubit and Steane's code on a depolarizing channel demonstrate significant advantages of the message-passing algorithms in two respects: (i) Optimal decoding increases by as much as 94% the error threshold below which the error correction procedure can be used to reliably send information over a noisy channel; and (ii) for noise levels below these thresholds, the probability of error after optimal decoding is suppressed at a significantly higher rate, leading to a substantial reduction of the error correction overhead.

  4. Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness

    NASA Astrophysics Data System (ADS)

    Liu, Kang; Ostermann, Joern

    Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.

  5. A Statistical Approach to Automatic Speech Summarization

    NASA Astrophysics Data System (ADS)

    Hori, Chiori; Furui, Sadaoki; Malkin, Rob; Yu, Hua; Waibel, Alex

    2003-12-01

    This paper proposes a statistical approach to automatic speech summarization. In our method, a set of words maximizing a summarization score indicating the appropriateness of summarization is extracted from automatically transcribed speech and then concatenated to create a summary. The extraction process is performed using a dynamic programming (DP) technique based on a target compression ratio. In this paper, we demonstrate how an English news broadcast transcribed by a speech recognizer is automatically summarized. We adapted our method, which was originally proposed for Japanese, to English by modifying the model for estimating word concatenation probabilities based on a dependency structure in the original speech given by a stochastic dependency context free grammar (SDCFG). We also propose a method of summarizing multiple utterances using a two-level DP technique. The automatically summarized sentences are evaluated by summarization accuracy based on a comparison with a manual summary of speech that has been correctly transcribed by human subjects. Our experimental results indicate that the method we propose can effectively extract relatively important information and remove redundant and irrelevant information from English news broadcasts.

  6. THE COMPREHENSION OF RAPID SPEECH BY THE BLIND, PART III.

    ERIC Educational Resources Information Center

    FOULKE, EMERSON

    A REVIEW OF THE RESEARCH ON THE COMPREHENSION OF RAPID SPEECH BY THE BLIND IDENTIFIES FIVE METHODS OF SPEECH COMPRESSION--SPEECH CHANGING, ELECTROMECHANICAL SAMPLING, COMPUTER SAMPLING, SPEECH SYNTHESIS, AND FREQUENCY DIVIDING WITH THE HARMONIC COMPRESSOR. THE SPEECH CHANGING AND ELECTROMECHANICAL SAMPLING METHODS AND THE NECESSARY APPARATUS HAVE…

  7. Serial Concatenated Trellis Coded Modulation with Iterative Decoding: Design and Performance

    NASA Technical Reports Server (NTRS)

    Benedetto, S.; Divsalar, D.; Montorsi, G.; Pollara, F.

    1997-01-01

    In this paper, we propose a novel method to design serial concatenation of an outer convolutional code with an inner trellis code with multi-level amplitude/phase modulations and a suitable bit-by-bit iterative decoding structure.

  8. Performance Analysis of the Link-16/JTIDS Waveform With Concatenated Coding

    DTIC Science & Technology

    2009-09-01

    ANALYSIS OF THE LINK-16/ JTIDS WAVEFORM WITH CONCATENATED CODING by Ioannis Koromilas September 2009 Thesis Advisor: Ralph C. Robertson...Master’s Thesis 4. TITLE AND SUBTITLE: Performance Analysis of the Link-16/ JTIDS Waveform with Concatenated Coding 6. AUTHOR Ioannis Koromilas 5...capabilities. The communication terminal of Link-16 is called the Joint Tactical Information Distribution System ( JTIDS ) and features Reed-Solomon (RS) coding

  9. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  10. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  11. Research in speech communication.

    PubMed Central

    Flanagan, J

    1995-01-01

    Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker. Images Fig. 1 Fig. 2 Fig. 5 Fig. 8 Fig. 11 Fig. 12 Fig. 13 PMID:7479806

  12. Methods of Teaching Speech Recognition

    ERIC Educational Resources Information Center

    Rader, Martha H.; Bailey, Glenn A.

    2010-01-01

    Objective: This article introduces the history and development of speech recognition, addresses its role in the business curriculum, outlines related national and state standards, describes instructional strategies, and discusses the assessment of student achievement in speech recognition classes. Methods: Research methods included a synthesis of…

  13. High pH reversed-phase chromatography with fraction concatenation for 2D proteomic analysis

    SciTech Connect

    Yang, Feng; Shen, Yufeng; Camp, David G.; Smith, Richard D.

    2012-04-01

    Orthogonal high-resolution separations are critical for attaining improved analytical dynamic ranges of proteome measurements. Concatenated high pH reversed phase liquid chromatography affords better separations than the strong cation exchange conventionally applied for two-dimensional shotgun proteomic analysis. For example, concatenated high pH reversed phase liquid chromatography increased identification coverage for peptides (e.g., by 1.8-fold) and proteins (e.g., by 1.6-fold) in shotgun proteomics analyses of a digested human protein sample. Additional advantages of concatenated high pH RPLC include improved protein sequence coverage, simplified sample processing, and reduced sample losses, making this an attractive first dimension separation strategy for two-dimensional proteomics analyses.

  14. 2matrix: A utility for indel coding and phylogenetic matrix concatenation(1.).

    PubMed

    Salinas, Nelson R; Little, Damon P

    2014-01-01

    Phylogenetic analysis of DNA and amino acid sequences requires the creation of files formatted specifically for each analysis package. Programs currently available cannot simultaneously code inferred insertion/deletion (indel) events in sequence alignments and concatenate data sets. • A novel Perl script, 2matrix, was created to concatenate matrices of non-molecular characters and/or aligned sequences and to code indels. 2matrix outputs a variety of formats compatible with popular phylogenetic programs. • 2matrix efficiently codes indels and concatenates matrices of sequences and non-molecular data. It is available for free download under a GPL (General Public License) open source license (https://github.com/nrsalinas/2matrix/archive/master.zip).

  15. Punctured Parallel and Serial Concatenated Convolutional Codes for BPSK/QPSK Channels

    NASA Technical Reports Server (NTRS)

    Acikel, Omer Fatih

    1999-01-01

    As available bandwidth for communication applications becomes scarce, bandwidth-efficient modulation and coding schemes become ever important. Since their discovery in 1993, turbo codes (parallel concatenated convolutional codes) have been the center of the attention in the coding community because of their bit error rate performance near the Shannon limit. Serial concatenated convolutional codes have also been shown to be as powerful as turbo codes. In this dissertation, we introduce algorithms for designing bandwidth-efficient rate r = k/(k + 1),k = 2, 3,..., 16, parallel and rate 3/4, 7/8, and 15/16 serial concatenated convolutional codes via puncturing for BPSK/QPSK (Binary Phase Shift Keying/Quadrature Phase Shift Keying) channels. Both parallel and serial concatenated convolutional codes have initially, steep bit error rate versus signal-to-noise ratio slope (called the -"cliff region"). However, this steep slope changes to a moderate slope with increasing signal-to-noise ratio, where the slope is characterized by the weight spectrum of the code. The region after the cliff region is called the "error rate floor" which dominates the behavior of these codes in moderate to high signal-to-noise ratios. Our goal is to design high rate parallel and serial concatenated convolutional codes while minimizing the error rate floor effect. The design algorithm includes an interleaver enhancement procedure and finds the polynomial sets (only for parallel concatenated convolutional codes) and the puncturing schemes that achieve the lowest bit error rate performance around the floor for the code rates of interest.

  16. On the error statistics of Viterbi decoding and the performance of concatenated codes

    NASA Technical Reports Server (NTRS)

    Miller, R. L.; Deutsch, L. J.; Butman, S. A.

    1981-01-01

    Computer simulation results are presented on the performance of convolutional codes of constraint lengths 7 and 10 concatenated with the (255, 223) Reed-Solomon code (a proposed NASA standard). These results indicate that as much as 0.8 dB can be gained by concatenating this Reed-Solomon code with a (10, 1/3) convolutional code, instead of the (7, 1/2) code currently used by the DSN. A mathematical model of Viterbi decoder burst-error statistics is developed and is validated through additional computer simulations.

  17. Programmable concatenation of conductively linked gold nanorods using molecular assembly and femtosecond irradiation

    NASA Astrophysics Data System (ADS)

    Fontana, Jake; Flom, Steve; Naciri, Jawad; Ratna, Banahalli

    The ability to tune the resonant frequency in plasmonic nanostructures is fundamental to developing novel optical properties and ensuing materials. Recent theoretical insights show that the plasmon resonance can be exquisitely controlled through the conductive concatenation of plasmonic nanoparticles. Furthermore these charge transfer systems may mimic complex and hard to build nanostructures. Here we experimentally demonstrate a directed molecular assembly approach to controllably concatenate gold nanorods end to end into discrete linear structures, bridged with gold nanojunctions, using femtosecond laser light. By utilizing high throughput and nanometer resolution this approach offers a pragmatic assembly strategy for charge transfer plasmonic systems.

  18. Speech enhancement via two-stage dual tree complex wavelet packet transform with a speech presence probability estimator

    NASA Astrophysics Data System (ADS)

    Sun, Pengfei; Qin, Jun

    2017-02-01

    In this paper, a two-stage dual tree complex wavelet packet transform (DTCWPT) based speech enhancement algorithm has been proposed, in which a speech presence probability (SPP) estimator and a generalized minimum mean squared error (MMSE) estimator are developed. To overcome the drawback of signal distortions caused by down sampling of WPT, a two-stage analytic decomposition concatenating undecimated WPT (UWPT) and decimated WPT is employed. An SPP estimator in the DTCWPT domain is derived based on a generalized Gamma distribution of speech, and Gaussian noise assumption. The validation results show that the proposed algorithm can obtain enhanced perceptual evaluation of speech quality (PESQ), and segmental signal-to-noise ratio (SegSNR) at low SNR nonstationary noise, compared with other four state-of-the-art speech enhancement algorithms, including optimally modified LSA (OM-LSA), soft masking using a posteriori SNR uncertainty (SMPO), a posteriori SPP based MMSE estimation (MMSE-SPP), and adaptive Bayesian wavelet thresholding (BWT).

  19. Computer-generated speech

    SciTech Connect

    Aimthikul, Y.

    1981-12-01

    This thesis reviews the essential aspects of speech synthesis and distinguishes between the two prevailing techniques: compressed digital speech and phonemic synthesis. It then presents the hardware details of the five speech modules evaluated. FORTRAN programs were written to facilitate message creation and retrieval with four of the modules driven by a PDP-11 minicomputer. The fifth module was driven directly by a computer terminal. The compressed digital speech modules (T.I. 990/306, T.S.I. Series 3D and N.S. Digitalker) each contain a limited vocabulary produced by the manufacturers while both the phonemic synthesizers made by Votrax permit an almost unlimited set of sounds and words. A text-to-phoneme rules program was adapted for the PDP-11 (running under the RSX-11M operating system) to drive the Votrax Speech Pac module. However, the Votrax Type'N Talk unit has its own built-in translator. Comparison of these modules revealed that the compressed digital speech modules were superior in pronouncing words on an individual basis but lacked the inflection capability that permitted the phonemic synthesizers to generate more coherent phrases. These findings were necessarily highly subjective and dependent on the specific words and phrases studied. In addition, the rapid introduction of new modules by manufacturers will necessitate new comparisons. However, the results of this research verified that all of the modules studied do possess reasonable quality of speech that is suitable for man-machine applications. Furthermore, the development tools are now in place to permit the addition of computer speech output in such applications.

  20. Speech Compression and Synthesis

    DTIC Science & Technology

    1979-01-05

    chosen, because the vowel [*] in combination with a voiced plosive results in a higher amplitude than most syllables. After the data has been... correlation coefficient of 0.554 is significant at Pɘ.001. 5,4.2 Quality Judgments Analysis of variance of the quality ratings showed that the...one was the manipulation of the MIX source model, and the other was the varying amount of breathiness in the speakers’ voices . One of the female

  1. Speech Compression and Synthesis

    DTIC Science & Technology

    1980-10-01

    phonetic vocoder. 3.1 Diphone Template Recognition A great deal of thought has gone into the design of our initial Phonetic reoognltion systen , using...algonthm to compute the DCT of a signal with N a power 2, resulting in a saving of 1/2 over the prev.ous method when the latter uses the fast...e * « assumed to be even), resulting in a similar saving of 1/2. When N is a power of 2, use of the FFT results in a saving compara- ble to

  2. Speech Compression and Synthesis

    DTIC Science & Technology

    1978-07-01

    AH AO AA UW R LI , is used to specify a SI-K sequence preceded by S and followed by back vowels or glides. These context effects are relatively ia...can depend on the two phonemes involved. For instance, the -. interpolation regions for consonant pairs probably needs to be wider than for vowel pairs...exceptions were the one or two new diphone templates per sentence (necessitated by the new speaker’s pronunciation ), which had to be extracted from

  3. Genomic representations using concatenates of Type IIB restriction endonuclease digestion fragments

    PubMed Central

    Tengs, Torstein; LaFramboise, Thomas; Den, Robert B.; Hayes, David N.; Zhang, Jianhua; DebRoy, Saikat; Gentleman, Robert C.; O'Neill, Keith; Birren, Bruce; Meyerson, Matthew

    2004-01-01

    We have developed a method for genomic representation using Type IIB restriction endonucleases. Representation by concatenation of restriction digests, or RECORD, is an approach to sample the fragments generated by cleavage with these enzymes. Here, we show that the RECORD libraries may be used for digital karyotyping and for pathogen identification by computational subtraction. PMID:15329383

  4. Concatenative and Nonconcatenative Plural Formation in L1, L2, and Heritage Speakers of Arabic

    ERIC Educational Resources Information Center

    Albirini, Abdulkafi; Benmamoun, Elabbas

    2014-01-01

    This study compares Arabic L1, L2, and heritage speakers' (HS) knowledge of plural formation, which involves concatenative and nonconcatenative modes of derivation. Ninety participants (divided equally among L1, L2, and heritage speakers) completed two oral tasks: a picture naming task (to measure proficiency) and a plural formation task. The…

  5. Molecular phylogenetic analysis of the Papionina using concatenation and species tree methods.

    PubMed

    Guevara, Elaine E; Steiper, Michael E

    2014-01-01

    The Papionina is a geographically widespread subtribe of African cercopithecid monkeys whose evolutionary history is of particular interest to anthropologists. The phylogenetic relationships among arboreal mangabeys (Lophocebus), baboons (Papio), and geladas (Theropithecus) remain unresolved. Molecular phylogenetic analyses have revealed marked gene tree incongruence for these taxa, and several recent concatenated phylogenetic analyses of multilocus datasets have supported different phylogenetic hypotheses. To address this issue, we investigated the phylogeny of the Lophocebus + Papio + Theropithecus group using concatenation methods, as well as alternative methods that incorporate gene tree heterogeneity to estimate a 'species tree.' Our compiled DNA sequence dataset was ∼56 kb pairs long and included 57 independent partitions. All analyses of concatenated alignments strongly supported a Lophocebus + Papio clade and a basal position for Theropithecus. The Bayesian concordance analysis supported the same phylogeny. A coalescent-based Bayesian method resulted in a very poorly resolved species tree. The topological agreement between concatenation and the Bayesian concordance analysis offers considerable support for a Lophocebus + Papio clade as the dominant relationship across the genome. However, the results of the Bayesian concordance analysis indicate that almost half the genome has an alternative history. As such, our results offer a well-supported phylogenetic hypothesis for the Papio/Lophocebus/Theropithecus trichotomy, while at the same time providing evidence for a complex evolutionary history that likely includes hybridization among lineages.

  6. A low-complexity and high performance concatenated coding scheme for high-speed satellite communications

    NASA Technical Reports Server (NTRS)

    Lin, Shu; Rhee, Dojun; Rajpal, Sandeep

    1993-01-01

    This report presents a low-complexity and high performance concatenated coding scheme for high-speed satellite communications. In this proposed scheme, the NASA Standard Reed-Solomon (RS) code over GF(2(exp 8) is used as the outer code and the second-order Reed-Muller (RM) code of Hamming distance 8 is used as the inner code. The RM inner code has a very simple trellis structure and is decoded with the soft-decision Viterbi decoding algorithm. It is shown that the proposed concatenated coding scheme achieves an error performance which is comparable to that of the NASA TDRS concatenated coding scheme in which the NASA Standard rate-1/2 convolutional code of constraint length 7 and d sub free = 10 is used as the inner code. However, the proposed RM inner code has much smaller decoding complexity, less decoding delay, and much higher decoding speed. Consequently, the proposed concatenated coding scheme is suitable for reliable high-speed satellite communications, and it may be considered as an alternate coding scheme for the NASA TDRS system.

  7. Concatenative and Nonconcatenative Plural Formation in L1, L2, and Heritage Speakers of Arabic

    ERIC Educational Resources Information Center

    Albirini, Abdulkafi; Benmamoun, Elabbas

    2014-01-01

    This study compares Arabic L1, L2, and heritage speakers' (HS) knowledge of plural formation, which involves concatenative and nonconcatenative modes of derivation. Ninety participants (divided equally among L1, L2, and heritage speakers) completed two oral tasks: a picture naming task (to measure proficiency) and a plural formation task. The…

  8. Overview of speech technology of the 80's

    SciTech Connect

    Crook, S.B.

    1981-01-01

    The author describes the technology innovations necessary to accommodate the market need which is the driving force toward greater perceived computer intelligence. The author discusses aspects of both speech synthesis and speech recognition.

  9. Speech Development

    MedlinePlus

    ... able to assess your child’s speech production and language development and make appropriate therapy recommendations. It is also ... pathologist should consistently assess your child’s speech and language development, as well as screen for hearing problems (with ...

  10. Speech Problems

    MedlinePlus

    ... and the respiratory system . The ability to understand language and produce speech is coordinated by the brain. So a person with brain damage from an accident, stroke, or birth defect may have speech and language problems. Some people with speech problems, particularly articulation ...

  11. VISIBLE SPEECH.

    ERIC Educational Resources Information Center

    POTTER, RALPH K.; AND OTHERS

    A CORRECTED REPUBLICATION OF THE 1947 EDITION, THE BOOK DESCRIBES A FORM OF VISIBLE SPEECH OBTAINED BY THE RECORDING OF AN ANALYSIS OF SPEECH SOMEWHAT SIMILAR TO THE ANALYSIS PERFORMED BY THE EAR. ORIGINALLY INTENDED TO PRESENT AN EXPERIMENTAL TRAINING PROGRAM IN THE READING OF VISIBLE SPEECH AND EXPANDED TO INCLUDE MATERIAL OF INTEREST TO VARIOUS…

  12. Using Concatenated Quantum Codes for Universal Fault-Tolerant Quantum Gates

    NASA Astrophysics Data System (ADS)

    Jochym-O'Connor, Tomas; Laflamme, Raymond

    2014-01-01

    We propose a method for universal fault-tolerant quantum computation using concatenated quantum error correcting codes. The concatenation scheme exploits the transversal properties of two different codes, combining them to provide a means to protect against low-weight arbitrary errors. We give the required properties of the error correcting codes to ensure universal fault tolerance and discuss a particular example using the 7-qubit Steane and 15-qubit Reed-Muller codes. Namely, other than computational basis state preparation as required by the DiVincenzo criteria, our scheme requires no special ancillary state preparation to achieve universality, as opposed to schemes such as magic state distillation. We believe that optimizing the codes used in such a scheme could provide a useful alternative to state distillation schemes that exhibit high overhead costs.

  13. Concatenation and Species Tree Methods Exhibit Statistically Indistinguishable Accuracy under a Range of Simulated Conditions

    PubMed Central

    Tonini, João; Moore, Andrew; Stern, David; Shcheglovitova, Maryia; Ortí, Guillermo

    2015-01-01

    Phylogeneticists have long understood that several biological processes can cause a gene tree to disagree with its species tree. In recent years, molecular phylogeneticists have increasingly foregone traditional supermatrix approaches in favor of species tree methods that account for one such source of error, incomplete lineage sorting (ILS). While gene tree-species tree discordance no doubt poses a significant challenge to phylogenetic inference with molecular data, researchers have only recently begun to systematically evaluate the relative accuracy of traditional and ILS-sensitive methods. Here, we report on simulations demonstrating that concatenation can perform as well or better than methods that attempt to account for sources of error introduced by ILS. Based on these and similar results from other researchers, we argue that concatenation remains a useful component of the phylogeneticist’s toolbox and highlight that phylogeneticists should continue to make explicit comparisons of results produced by contemporaneous and classical methods. PMID:25901289

  14. Performance analysis of a concatenated erbium-doped fiber amplifier supporting four mode groups

    NASA Astrophysics Data System (ADS)

    Qin, Zujun; Fan, Di; Zhang, Wentao; Xiong, Xianming

    2016-05-01

    An erbium-doped fiber amplifier (EDFA) supporting four mode groups has been theoretically designed by concatenating two sections of erbium-doped fibers (EDFs). Each EDF has a simple erbium doping profile for the purpose of reducing its fabrication complexity. We propose a modified genetic algorithm (GA) to provide detailed investigations on the concatenated amplifier. Both the optimal fiber length and erbium doping radius in each EDF have been found to minimize the gain difference between signal modes. Results show that the parameters of the central-doped EDF have a greater impact on the amplifier performance compared to those of the annular-doped one. We then investigate the influence of the small deviations of the erbium fiber length, doping radius and doping concentration of each EDF from their optimal values upon the amplifier performance, and discuss their design tolerances in obtaining a desirable amplification characteristics.

  15. Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences.

    PubMed

    Jin, Xin; Tecuapetla, Fatuel; Costa, Rui M

    2014-03-01

    Chunking allows the brain to efficiently organize memories and actions. Although basal ganglia circuits have been implicated in action chunking, little is known about how individual elements are concatenated into a behavioral sequence at the neural level. Using a task in which mice learned rapid action sequences, we uncovered neuronal activity encoding entire sequences as single actions in basal ganglia circuits. In addition to neurons with activity related to the start/stop activity signaling sequence parsing, we found neurons displaying inhibited or sustained activity throughout the execution of an entire sequence. This sustained activity covaried with the rate of execution of individual sequence elements, consistent with motor concatenation. Direct and indirect pathways of basal ganglia were concomitantly active during sequence initiation, but behaved differently during sequence performance, revealing a more complex functional organization of these circuits than previously postulated. These results have important implications for understanding the functional organization of basal ganglia during the learning and execution of action sequences.

  16. Concatenated shift registers generating maximally spaced phase shifts of PN-sequences

    NASA Technical Reports Server (NTRS)

    Hurd, W. J.; Welch, L. R.

    1977-01-01

    A large class of linearly concatenated shift registers is shown to generate approximately maximally spaced phase shifts of pn-sequences, for use in pseudorandom number generation. A constructive method is presented for finding members of this class, for almost all degrees for which primitive trinomials exist. The sequences which result are not normally characterized by trinomial recursions, which is desirable since trinomial sequences can have some undesirable randomness properties.

  17. Probability of undetected error after decoding for a concatenated coding scheme

    NASA Technical Reports Server (NTRS)

    Costello, D. J., Jr.; Lin, S.

    1984-01-01

    A concatenated coding scheme for error control in data communications is analyzed. In this scheme, the inner code is used for both error correction and detection, however the outer code is used only for error detection. A retransmission is requested if the outer code detects the presence of errors after the inner code decoding. Probability of undetected error is derived and bounded. A particular example, proposed for NASA telecommand system is analyzed.

  18. Vapor pressure measurements on low-volatility terpenoid compounds by the concatenated gas saturation method.

    PubMed

    Widegren, Jason A; Bruno, Thomas J

    2010-01-01

    The atmospheric oxidation of monoterpenes plays a central role in the formation of secondary organic aerosols (SOAs), which have important effects on the weather and climate. However, models of SOA formation have large uncertainties. One reason for this is that SOA formation depends directly on the vapor pressures of the monoterpene oxidation products, but few vapor pressures have been reported for these compounds. As a result, models of SOA formation have had to rely on estimated values of vapor pressure. To alleviate this problem, we have developed the concatenated gas saturation method, which is a simple, reliable, high-throughput method for measuring the vapor pressures of low-volatility compounds. The concatenated gas saturation method represents a significant advance over traditional gas saturation methods. Instead of a single saturator and trap, the concatenated method uses several pairs of saturators and traps linked in series. Consequently, several measurements of vapor pressure can be made simultaneously, which greatly increases the rate of data collection. It also allows for the simultaneous measurement of a control compound, which is important for ensuring data quality. In this paper we demonstrate the use of the concatenated gas saturation method by determination of the vapor pressures of five monoterpene oxidation products and n-tetradecane (the control compound) over the temperature range 283.15-313.15 K. Over this temperature range, the vapor pressures ranged from about 0.5 Pa to about 70 Pa. The standard molar enthalpies of vaporization or sublimation were determined by use of the Clausius-Clapeyron equation.

  19. Convergence Analysis of Turbo Decoding of Serially Concatenated Block Codes and Product Codes

    NASA Astrophysics Data System (ADS)

    Krause, Amir; Sella, Assaf; Be'ery, Yair

    2005-12-01

    The geometric interpretation of turbo decoding has founded a framework, and provided tools for the analysis of parallel-concatenated codes decoding. In this paper, we extend this analytical basis for the decoding of serially concatenated codes, and focus on serially concatenated product codes (SCPC) (i.e., product codes with checks on checks). For this case, at least one of the component (i.e., rows/columns) decoders should calculate the extrinsic information not only for the information bits, but also for the check bits. We refer to such a component decoder as a serial decoding module (SDM). We extend the framework accordingly, derive the update equations for a general turbo decoder of SCPC, and the expressions for the main analysis tools: the Jacobian and stability matrices. We explore the stability of the SDM. Specifically, for high SNR, we prove that the maximal eigenvalue of the SDM's stability matrix approaches [InlineEquation not available: see fulltext.], where [InlineEquation not available: see fulltext.] is the minimum Hamming distance of the component code. Hence, for practical codes, the SDM is unstable. Further, we analyze the two turbo decoding schemes, proposed by Benedetto and Pyndiah, by deriving the corresponding update equations and by demonstrating the structure of their stability matrices for the repetition code and an SCPC code with [InlineEquation not available: see fulltext.] information bits. Simulation results for the Hamming [InlineEquation not available: see fulltext.] and Golay [InlineEquation not available: see fulltext.] codes are presented, analyzed, and compared to the theoretical results and to simulations of turbo decoding of parallel concatenation of the same codes.

  20. Coalescence vs. concatenation: Sophisticated analyses vs. first principles applied to rooting the angiosperms.

    PubMed

    Simmons, Mark P; Gatesy, John

    2015-10-01

    It has recently been concluded that phylogenomic data from 310 nuclear genes support the clade of (Amborellales, Nymphaeales) as sister to the remaining angiosperms and that shortcut coalescent phylogenetic methods outperformed concatenation for these data. We falsify both of those conclusions here by demonstrating that discrepant results between the coalescent and concatenation analyses are primarily caused by the coalescent methods applied (MP-EST and STAR) not being robust to the highly divergent and often mis-rooted gene trees that were used. This result reinforces the expectation that low amounts of phylogenetic signal and methodological artifacts in gene-tree reconstruction can be more problematic for shortcut coalescent methods than is the assumption of a single hierarchy for all genes by concatenation methods when these approaches are applied to ancient divergences in empirical studies. We also demonstrate that a third coalescent method, ASTRAL, is more robust to mis-rooted gene trees than MP-EST or STAR, and that both Observed Variability (OV) and Tree Independent Generation of Evolutionary Rates (TIGER), which are two character subsampling procedures, are biased in favor of characters with highly asymmetrical distributions of character states when applied to this dataset. We conclude that enthusiastic application of novel tools is not a substitute for rigorous application of first principles, and that trending methods (e.g., shortcut coalescent methods applied to ancient divergences, tree-independent character subsampling), may be novel sources of previously under-appreciated, systematic errors.

  1. Extension of the double-wave-vector diffusion-weighting experiment to multiple concatenations.

    PubMed

    Finsterbusch, Jürgen

    2009-06-01

    Experiments involving two diffusion-weightings in a single acquisition, so-called double- or two-wave-vector experiments, have recently been applied to measure the microscopic anisotropy in macroscopically isotropic samples or to estimate pore or compartment sizes. These informations are derived from the signal modulation observed when varying the wave vectors' orientations. However, the modulation amplitude can be small and, for short mixing times between the two diffusion-weightings, decays with increased gradient pulse lengths which hampers its detectability on whole-body MR systems. Here, an approach is investigated that involves multiple concatenations of the two diffusion-weightings in a single experiment. The theoretical framework for double-wave-vector experiments of fully restricted diffusion is adapted and the corresponding tensor approach recently presented for short mixing times extended and compared to numerical simulations. It is shown that for short mixing times (i) the extended tensor approach well describes the signal behavior observed for multiple concatenations and (ii) the relative amplitude of the signal modulation increases with the number of concatenations. Thus, the presented extension of the double-wave-vector experiment may help to improve the detectability of the signal modulations observed for short mixing times, in particular on whole-body MR systems with their limited gradient amplitudes.

  2. Viterbi decoder node synchronization losses in the Reed-Solomon/Veterbi concatenated channel

    NASA Technical Reports Server (NTRS)

    Deutsch, L. J.; Miller, R. L.

    1982-01-01

    The Viterbi decoders currently used by the Deep Space Network (DSN) employ an algorithm for maintaining node synchronization that significantly degrades at bit signal-to-noise ratios (SNRs) of below 2.0 dB. In a recent report by the authors, it was shown that the telemetry receiving system, which uses a convolutionally encoded downlink, will suffer losses of 0.85 dB and 1.25 dB respectively at Voyager 2 Uranus and Neptune encounters. This report extends the results of that study to a concatenated (255,223) Reed-Solomon/(7, 1/2) convolutionally coded channel, by developing a new radio loss model for the concatenated channel. It is shown here that losses due to improper node synchronization of 0.57 dB at Uranus and 1.0 dB at Neptune can be expected if concatenated coding is used along with an array of one 64-meter and three 34-meter antennas.

  3. Computational neuroanatomy of speech production

    PubMed Central

    Hickok, Gregory

    2017-01-01

    Speech production has been studied predominantly from within two traditions, psycholinguistics and motor control. These traditions have rarely interacted and the resulting chasm between these approaches seems to reflect a level of analysis difference: while motor control is concerned with lower-level articulatory control, psycholinguistics focuses on higher-level linguistic processing. However, closer examination of both approaches reveals a substantial convergence of ideas. The goal of this article is to integrate psycholinguistic and motor control approaches to speech production. The result of this synthesis is a neuroanatomically grounded hierarchical state feedback control model of speech production. PMID:22218206

  4. Computational neuroanatomy of speech production.

    PubMed

    Hickok, Gregory

    2012-01-05

    Speech production has been studied predominantly from within two traditions, psycholinguistics and motor control. These traditions have rarely interacted, and the resulting chasm between these approaches seems to reflect a level of analysis difference: whereas motor control is concerned with lower-level articulatory control, psycholinguistics focuses on higher-level linguistic processing. However, closer examination of both approaches reveals a substantial convergence of ideas. The goal of this article is to integrate psycholinguistic and motor control approaches to speech production. The result of this synthesis is a neuroanatomically grounded, hierarchical state feedback control model of speech production.

  5. Trainable Videorealistic Speech Animation

    DTIC Science & Technology

    2006-01-01

    by [ Brand 1999] [Masuko et al. 1998] [Brooke and Scott 1994]. Speech animation needs to solve several problems simultane- ously: firstly, the animation...described by Brand et al. [1999] [2000] are analogous (and more sophisticated) than the trajectory synthesis techniques we use (Equations 12 and 17...Meredith and Dynasty Models; Craig Milanesi, Dave Konstine, Jay Benoit from MIT Video Productions; Marypat Fitzgerald and Casey Johnson from CBCL; Volker

  6. Concatenated coding systems employing a unit-memory convolutional code and a byte-oriented decoding algorithm

    NASA Technical Reports Server (NTRS)

    Lee, L. N.

    1976-01-01

    Concatenated coding systems utilizing a convolutional code as the inner code and a Reed-Solomon code as the outer code are considered. In order to obtain very reliable communications over a very noisy channel with relatively small coding complexity, it is proposed to concatenate a byte oriented unit memory convolutional code with an RS outer code whose symbol size is one byte. It is further proposed to utilize a real time minimal byte error probability decoding algorithm, together with feedback from the outer decoder, in the decoder for the inner convolutional code. The performance of the proposed concatenated coding system is studied, and the improvement over conventional concatenated systems due to each additional feature is isolated.

  7. Concatenated coding systems employing a unit-memory convolutional code and a byte-oriented decoding algorithm

    NASA Technical Reports Server (NTRS)

    Lee, L.-N.

    1977-01-01

    Concatenated coding systems utilizing a convolutional code as the inner code and a Reed-Solomon code as the outer code are considered. In order to obtain very reliable communications over a very noisy channel with relatively modest coding complexity, it is proposed to concatenate a byte-oriented unit-memory convolutional code with an RS outer code whose symbol size is one byte. It is further proposed to utilize a real-time minimal-byte-error probability decoding algorithm, together with feedback from the outer decoder, in the decoder for the inner convolutional code. The performance of the proposed concatenated coding system is studied, and the improvement over conventional concatenated systems due to each additional feature is isolated.

  8. Human vocal tract analysis by in vivo 3D MRI during phonation: a complete system for imaging, quantitative modeling, and speech synthesis.

    PubMed

    Wismueller, Axel; Behrends, Johannes; Hoole, Phil; Leinsinger, Gerda L; Reiser, Maximilian F; Westesson, Per-Lennart

    2008-01-01

    We present a complete system for image-based 3D vocal tract analysis ranging from MR image acquisition during phonation, semi-automatic image processing, quantitative modeling including model-based speech synthesis, to quantitative model evaluation by comparison between recorded and synthesized phoneme sounds. For this purpose, six professionally trained speakers, age 22-34y, were examined using a standardized MRI protocol (1.5 T, T1w FLASH, ST 4mm, 23 slices, acq. time 21s). The volunteers performed a prolonged (> or = 21s) emission of sounds of the German phonemic inventory. Simultaneous audio tape recording was obtained to control correct utterance. Scans were made in axial, coronal, and sagittal planes each. Computer-aided quantitative 3D evaluation included (i) automated registration of the phoneme-specific data acquired in different slice orientations, (ii) semi-automated segmentation of oropharyngeal structures, (iii) computation of a curvilinear vocal tract midline in 3D by nonlinear PCA, (iv) computation of cross-sectional areas of the vocal tract perpendicular to this midline. For the vowels /a/,/e/,/i/,/o/,/ø/,/u/,/y/, the extracted area functions were used to synthesize phoneme sounds based on an articulatory-acoustic model. For quantitative analysis, recorded and synthesized phonemes were compared, where area functions extracted from 2D midsagittal slices were used as a reference. All vowels could be identified correctly based on the synthesized phoneme sounds. The comparison between synthesized and recorded vowel phonemes revealed that the quality of phoneme sound synthesis was improved for phonemes /a/, /o/, and /y/, if 3D instead of 2D data were used, as measured by the average relative frequency shift between recorded and synthesized vowel formants (p < 0.05, one-sided Wilcoxon rank sum test). In summary, the combination of fast MRI followed by subsequent 3D segmentation and analysis is a novel approach to examine human phonation in vivo. It

  9. A Deep Ensemble Learning Method for Monaural Speech Separation

    PubMed Central

    Zhang, Xiao-Lei; Wang, DeLiang

    2016-01-01

    Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN)-based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences between the two optimization objectives are not well understood. In this paper, we propose a deep ensemble method, named multicontext networks, to address monaural speech separation. The first multicontext network averages the outputs of multiple DNNs whose inputs employ different window lengths. The second multicontext network is a stack of multiple DNNs. Each DNN in a module of the stack takes the concatenation of original acoustic features and expansion of the soft output of the lower module as its input, and predicts the ratio mask of the target speaker; the DNNs in the same module employ different contexts. We have conducted extensive experiments with three speech corpora. The results demonstrate the effectiveness of the proposed method. We have also compared the two optimization objectives systematically and found that predicting the ideal time-frequency mask is more efficient in utilizing clean training speech, while predicting clean speech is less sensitive to SNR variations. PMID:27917394

  10. Apraxia of Speech

    MedlinePlus

    ... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... I find more information? What is apraxia of speech? Apraxia of speech, also known as verbal apraxia ...

  11. CString Concatenation

    DTIC Science & Technology

    2015-09-01

    code is as follows: plus_equal = my_strings[O]; OOEEEOOG push 0 OOEEE008 lea ecx,(ebp-128h) OOEEEOOE call std: :vector< ATL : :CString T <wchar _t...Str TraitMFC _DLL <wchar _t,A TL: :Ch T raitsCRT <wchar _t> > >,std::allocator< ATL ::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t, ATL : :ChTraitsCRT<wchar_t...std::vector< ATL ::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t, ATL ::ChTraitsCRT<wchar_t> > >,std::allocator< ATL ::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t

  12. Speech Development

    MedlinePlus

    ... grunt” or “growl” sounds. These sounds represent a behavior that some children learn in an attempt to compensate for velopharyngeal ... It is important that you talk to your child and encourage him or her to practice appropriate speech behaviors. If possible, work closely with your speech-language ...

  13. Speech Communication.

    ERIC Educational Resources Information Center

    Brooks, William D.

    Presented in this book is a view of speech communication which enables an individual to become fully aware of his or her role as both initiator and recipient of messages. Communication is treated broadly with emphasis on the understanding and skills relating to various types of speech communication across the broad spectrum of human communication.…

  14. Speech Aids

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Designed to assist deaf and hearing impaired-persons in achieving better speech, Resnick Worldwide Inc.'s device provides a visual means of cuing the deaf as a speech-improvement measure. This is done by electronically processing the subjects' sounds and comparing them with optimum values which are displayed for comparison.

  15. Symbolic Speech

    ERIC Educational Resources Information Center

    Podgor, Ellen S.

    1976-01-01

    The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)

  16. Symbolic Speech

    ERIC Educational Resources Information Center

    Podgor, Ellen S.

    1976-01-01

    The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)

  17. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  18. A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model

    PubMed Central

    Panchapagesan, Sankaran; Alwan, Abeer

    2011-01-01

    In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants. PMID:21476670

  19. A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model.

    PubMed

    Panchapagesan, Sankaran; Alwan, Abeer

    2011-04-01

    In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.

  20. Use of Computer Speech Technologies To Enhance Learning.

    ERIC Educational Resources Information Center

    Ferrell, Joe

    1999-01-01

    Discusses the design of an innovative learning system that uses new technologies for the man-machine interface, incorporating a combination of Automatic Speech Recognition (ASR) and Text To Speech (TTS) synthesis. Highlights include using speech technologies to mimic the attributes of the ideal tutor and design features. (AEF)

  1. SCGICAR: Spatial concatenation based group ICA with reference for fMRI data analysis.

    PubMed

    Shi, Yuhu; Zeng, Weiming; Wang, Nizhuan

    2017-09-01

    With the rapid development of big data, the functional magnetic resonance imaging (fMRI) data analysis of multi-subject is becoming more and more important. As a kind of blind source separation technique, group independent component analysis (GICA) has been widely applied for the multi-subject fMRI data analysis. However, spatial concatenated GICA is rarely used compared with temporal concatenated GICA due to its disadvantages. In this paper, in order to overcome these issues and to consider that the ability of GICA for fMRI data analysis can be improved by adding a priori information, we propose a novel spatial concatenation based GICA with reference (SCGICAR) method to take advantage of the priori information extracted from the group subjects, and then the multi-objective optimization strategy is used to implement this method. Finally, the post-processing means of principal component analysis and anti-reconstruction are used to obtain group spatial component and individual temporal component in the group, respectively. The experimental results show that the proposed SCGICAR method has a better performance on both single-subject and multi-subject fMRI data analysis compared with classical methods. It not only can detect more accurate spatial and temporal component for each subject of the group, but also can obtain a better group component on both temporal and spatial domains. These results demonstrate that the proposed SCGICAR method has its own advantages in comparison with classical methods, and it can better reflect the commonness of subjects in the group. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Motor modules of human locomotion: influence of EMG averaging, concatenation, and number of step cycles

    PubMed Central

    Oliveira, Anderson S.; Gizzi, Leonardo; Farina, Dario; Kersting, Uwe G.

    2014-01-01

    Locomotion can be investigated by factorization of electromyographic (EMG) signals, e.g., with non-negative matrix factorization (NMF). This approach is a convenient concise representation of muscle activities as distributed in motor modules, activated in specific gait phases. For applying NMF, the EMG signals are analyzed either as single trials, or as averaged EMG, or as concatenated EMG (data structure). The aim of this study is to investigate the influence of the data structure on the extracted motor modules. Twelve healthy men walked at their preferred speed on a treadmill while surface EMG signals were recorded for 60s from 10 lower limb muscles. Motor modules representing relative weightings of synergistic muscle activations were extracted by NMF from 40 step cycles separately (EMGSNG), from averaging 2, 3, 5, 10, 20, and 40 consecutive cycles (EMGAVR), and from the concatenation of the same sets of consecutive cycles (EMGCNC). Five motor modules were sufficient to reconstruct the original EMG datasets (reconstruction quality >90%), regardless of the type of data structure used. However, EMGCNC was associated with a slightly reduced reconstruction quality with respect to EMGAVR. Most motor modules were similar when extracted from different data structures (similarity >0.85). However, the quality of the reconstructed 40-step EMGCNC datasets when using the muscle weightings from EMGAVR was low (reconstruction quality ~40%). On the other hand, the use of weightings from EMGCNC for reconstructing this long period of locomotion provided higher quality, especially using 20 concatenated steps (reconstruction quality ~80%). Although EMGSNG and EMGAVR showed a higher reconstruction quality for short signal intervals, these data structures did not account for step-to-step variability. The results of this study provide practical guidelines on the methodological aspects of synergistic muscle activation extraction from EMG during locomotion. PMID:24904375

  3. Space communication system for compressed data with a concatenated Reed-Solomon-Viterbi coding channel

    NASA Technical Reports Server (NTRS)

    Rice, R. F.; Hilbert, E. E. (Inventor)

    1976-01-01

    A space communication system incorporating a concatenated Reed Solomon Viterbi coding channel is discussed for transmitting compressed and uncompressed data from a spacecraft to a data processing center on Earth. Imaging (and other) data are first compressed into source blocks which are then coded by a Reed Solomon coder and interleaver, followed by a convolutional encoder. The received data is first decoded by a Viterbi decoder, followed by a Reed Solomon decoder and deinterleaver. The output of the latter is then decompressed, based on the compression criteria used in compressing the data in the spacecraft. The decompressed data is processed to reconstruct an approximation of the original data-producing condition or images.

  4. Generation of concatenated Greenberger-Horne-Zeilinger-type entangled coherent state based on linear optics

    NASA Astrophysics Data System (ADS)

    Guo, Rui; Zhou, Lan; Gu, Shi-Pu; Wang, Xing-Fu; Sheng, Yu-Bo

    2017-03-01

    The concatenated Greenberger-Horne-Zeilinger (C-GHZ) state is a new type of multipartite entangled state, which has potential application in future quantum information. In this paper, we propose a protocol of constructing arbitrary C-GHZ entangled state approximatively. Different from previous protocols, each logic qubit is encoded in the coherent state. This protocol is based on the linear optics, which is feasible in experimental technology. This protocol may be useful in quantum information based on the C-GHZ state.

  5. Investigation of the Use of Erasures in a Concatenated Coding Scheme

    NASA Technical Reports Server (NTRS)

    Kwatra, S. C.; Marriott, Philip J.

    1997-01-01

    A new method for declaring erasures in a concatenated coding scheme is investigated. This method is used with the rate 1/2 K = 7 convolutional code and the (255, 223) Reed Solomon code. Errors and erasures Reed Solomon decoding is used. The erasure method proposed uses a soft output Viterbi algorithm and information provided by decoded Reed Solomon codewords in a deinterleaving frame. The results show that a gain of 0.3 dB is possible using a minimum amount of decoding trials.

  6. Performance of convolution coding concatenated with MFSK modulation in a Gaussian channel

    NASA Technical Reports Server (NTRS)

    Choudhury, A. K.

    1971-01-01

    The improvement in db due to concatenation over conventional M-ary coding is studied to reduce the probability of a bit error and to increase the available bit rate for the same system parameters of error rate, transmitter power, and range. The results of calculations for orthogonal modulation with noncoherent detection and Q-level correlator quantization are presented. It is shown that the correlator outputs are quantized to one of the Q levels, and the receiver output is a vector consisting of a list of the M correlator quantum levels. The channel has Q(M) possible outputs and M possible inputs. Optimum output is approached by increasing fine quantization

  7. Interleaved concatenated codes: New perspectives on approaching the Shannon limit

    PubMed Central

    Viterbi, A. J.; Viterbi, A. M.; Sindhushayana, N. T.

    1997-01-01

    The last few years have witnessed a significant decrease in the gap between the Shannon channel capacity limit and what is practically achievable. Progress has resulted from novel extensions of previously known coding techniques involving interleaved concatenated codes. A considerable body of simulation results is now available, supported by an important but limited theoretical basis. This paper presents a computational technique which further ties simulation results to the known theory and reveals a considerable reduction in the complexity required to approach the Shannon limit. PMID:11038568

  8. Speech processing: An evolving technology

    SciTech Connect

    Crochiere, R.E.; Flanagan, J.L.

    1986-09-01

    As we enter the information age, speech processing is emerging as an important technology for making machines easier and more convenient for humans to use. It is both an old and a new technology - dating back to the invention of the telephone and forward, at least in aspirations, to the capabilities of HAL in 2001. Explosive advances in microelectronics now make it possible to implement economical real-time hardware for sophisticated speech processing - processing that formerly could be demonstrated only in simulations on main-frame computers. As a result, fundamentally new product concepts - as well as new features and functions in existing products - are becoming possible and are being explored in the marketplace. As the introductory piece to this issue, the authors draw a brief perspective on the evolving field of speech processing and assess the technology in the the three constituent sectors: speech coding, synthesis, and recognition.

  9. Listen up! Speech is for thinking during infancy.

    PubMed

    Vouloumanos, Athena; Waxman, Sandra R

    2014-12-01

    Infants' exposure to human speech within the first year promotes more than speech processing and language acquisition: new developmental evidence suggests that listening to speech shapes infants' fundamental cognitive and social capacities. Speech streamlines infants' learning, promotes the formation of object categories, signals communicative partners, highlights information in social interactions, and offers insight into the minds of others. These results, which challenge the claim that for infants, speech offers no special cognitive advantages, suggest a new synthesis. Far earlier than researchers had imagined, an intimate and powerful connection between human speech and cognition guides infant development, advancing infants' acquisition of fundamental psychological processes.

  10. Speech Problems

    MedlinePlus

    ... thinking, but it becomes disorganized as they're speaking. So, someone who clutters may speak in bursts ... refuse to wait patiently for them to finish speaking. If you have a speech problem, it's fine ...

  11. High rate concatenated coding systems using bandwidth efficient trellis inner codes

    NASA Technical Reports Server (NTRS)

    Deng, Robert H.; Costello, Daniel J., Jr.

    1989-01-01

    High-rate concatenated coding systems with bandwidth-efficient trellis inner codes and Reed-Solomon (RS) outer codes are investigated for application in high-speed satellite communication systems. Two concatenated coding schemes are proposed. In one the inner code is decoded with soft-decision Viterbi decoding, and the outer RS code performs error-correction-only decoding (decoding without side information). In the other, the inner code is decoded with a modified Viterbi algorithm, which produces reliability information along with the decoded output. In this algorithm, path metrics are used to estimate the entire information sequence, whereas branch metrics are used to provide reliability information on the decoded sequence. This information is used to erase unreliable bits in the decoded output. An errors-and-erasures RS decoder is then used for the outer code. The two schemes have been proposed for high-speed data communication on NASA satellite channels. The rates considered are at least double those used in current NASA systems, and the results indicate that high system reliability can still be achieved.

  12. Typical l1-recovery limit of sparse vectors represented by concatenations of random orthogonal matrices

    NASA Astrophysics Data System (ADS)

    Kabashima, Yoshiyuki; Vehkaperä, Mikko; Chatterjee, Saikat

    2012-12-01

    We consider the problem of recovering an N-dimensional sparse vector x from its linear transformation y = Dx of M (concatenating T = N/M matrices O1,O2,…,OT drawn uniformly according to the Haar measure on the M × M orthogonal matrices. By using the replica method in conjunction with the development of an integral formula to handle the random orthogonal matrices, we show that the concatenated matrices can result in better recovery performance than that predicted by the universality when the density of non-zero signals is not uniform among the T matrix modules. The universal condition is reproduced for the special case of uniform non-zero signal densities. Extensive numerical experiments support the theoretical predictions.

  13. Concatenation of ‘alert’ and ‘identity’ segments in dingoes’ alarm calls

    PubMed Central

    Déaux, Eloïse C.; Allen, Andrew P.; Clarke, Jennifer A.; Charrier, Isabelle

    2016-01-01

    Multicomponent signals can be formed by the uninterrupted concatenation of multiple call types. One such signal is found in dingoes, Canis familiaris dingo. This stereotyped, multicomponent ‘bark-howl’ vocalisation is formed by the concatenation of a noisy bark segment and a tonal howl segment. Both segments are structurally similar to bark and howl vocalisations produced independently in other contexts (e.g. intra- and inter-pack communication). Bark-howls are mainly uttered in response to human presence and were hypothesized to serve as alarm calls. We investigated the function of bark-howls and the respective roles of the bark and howl segments. We found that dingoes could discriminate between familiar and unfamiliar howl segments, after having only heard familiar howl vocalisations (i.e. different calls). We propose that howl segments could function as ‘identity signals’ and allow receivers to modulate their responses according to the caller’s characteristics. The bark segment increased receivers’ attention levels, providing support for earlier observational claims that barks have an ‘alerting’ function. Lastly, dingoes were more likely to display vigilance behaviours upon hearing bark-howl vocalisations, lending support to the alarm function hypothesis. Canid vocalisations, such as the dingo bark-howl, may provide a model system to investigate the selective pressures shaping complex communication systems. PMID:27460289

  14. The effects of receiver tracking phase error on the performance of the concatenated Reed-Solomon/Viterbi channel coding system

    NASA Technical Reports Server (NTRS)

    Liu, K. Y.

    1981-01-01

    Analytical and experimental results are presented of the effects of receiver tracking phase error, caused by weak signal conditions on either the uplink or the downlink or both, on the performance of the concatenated Reed-Solomon (RS) Viterbi channel coding system. The test results were obtained under an emulated S band uplink and X band downlink, two way space communication channel in the telecommunication development laboratory of JPL with data rates ranging from 4 kHz to 20 kHz. It is shown that, with ideal interleaving, the concatenated RS/Viterbi coding system is capable of yielding large coding gains at very low bit error probabilities over the Viterbi decoded convolutional only coding system. Results on the effects of receiver tracking phase errors on the performance of the concatenated coding system with antenna array combining are included.

  15. Man-machine interaction in the 21st century--new paradigms through dynamic scene analysis and synthesis (Keynote Speech)

    NASA Astrophysics Data System (ADS)

    Huang, Thomas S.; Orchard, Michael T.

    1992-11-01

    The past twenty years have witnessed a revolution in the use of computers in virtually every facet of society. While this revolution has been largely fueled by dramatic technological advances, the efficient application of this technology has been made possible through advances in the paradigms defining the way users interact with computers. Today's massive computational power would probably have limited sociological impact is users still communicated with computers via the binary machine language codes used in the 1950's. Instead, this primitive paradigm was replaced by keyboards and ASCII character displays in the 1970's, and the 'mouse' and multiple-window bit-mapped displays in the 1980's. As continuing technological advances make even larger computational power available in the future, advanced paradigms for man-machine interaction will be required to allow this power to be used efficiently in a wide range of applications. Looking ahead into the 21st century, we see paradigms supporting radically new ways of interacting with computers. Ideally, we would like these interactions to mimic the ways we interact with objects and people in the physical world, and, to achieve this goal, we believe that it is essential to consider the exchange of video data into and out of the computer. Paradigms based on visual interactions represent a radical departure from existing paradigms, because they allow the computer to actively seek out information from the user via dynamic scene analysis. For example, the computer might enlarge the display when it detects that the user if squinting, or it might reorient a three- dimensional object on the screen in response to detected hand motions. This contrasts with current paradigms in which the computer relies on passive switching devices (keyboard, mouse, buttons, etc.) to receive information. Feedback will be provided to the user via dynamic scene synthesis, employing stereoscopic three-dimensional display systems. To exploit the

  16. Voice synthesis application

    SciTech Connect

    Lightstone, P.C.; Davidson, W.M.

    1982-01-27

    Selection of a speech synthesis system as an augmentation for a perimeter security device is described. Criteria used in selection of a system are discussed. The final system is a speech 1000 speech synthesizer board that has a 2000 word speech lexicon, a first time charge of $75 for a 32 K EPROM of custom words, and extra features such as an alternate command to adjust desired listening level.

  17. Sample-based engine noise synthesis using an enhanced pitch-synchronous overlap-and-add method.

    PubMed

    Jagla, Jan; Maillard, Julien; Martin, Nadine

    2012-11-01

    An algorithm for the real time synthesis of internal combustion engine noise is presented. Through the analysis of a recorded engine noise signal of continuously varying engine speed, a dataset of sound samples is extracted allowing the real time synthesis of the noise induced by arbitrary evolutions of engine speed. The sound samples are extracted from a recording spanning the entire engine speed range. Each sample is delimitated such as to contain the sound emitted during one cycle of the engine plus the necessary overlap to ensure smooth transitions during the synthesis. The proposed approach, an extension of the PSOLA method introduced for speech processing, takes advantage of the specific periodicity of engine noise signals to locate the extraction instants of the sound samples. During the synthesis stage, the sound samples corresponding to the target engine speed evolution are concatenated with an overlap and add algorithm. It is shown that this method produces high quality audio restitution with a low computational load. It is therefore well suited for real time applications.

  18. Self-Similar Conformations and Dynamics of Non-Concatenated Entangled Ring Polymers

    NASA Astrophysics Data System (ADS)

    Ge, Ting

    A scaling model of self-similar conformations and dynamics of non-concatenated entangled ring polymers is developed. Topological constraints force these ring polymers into compact conformations with fractal dimension D =3 that we call fractal loopy globules (FLGs). This result is based on the conjecture that the overlap parameter of loops on all length scales is equal to the Kavassalis-Noolandi number 10-20. The dynamics of entangled rings is self-similar, and proceeds as loops of increasing sizes are rearranged progressively at their respective diffusion times. The topological constraints associated with smaller rearranged loops affect the dynamics of larger loops by increasing the effective friction coefficient, but have no influence on the tubes confining larger loops. Therefore, the tube diameter defined as the average spacing between relevant topological constraints increases with time, leading to ``tube dilation''. Analysis of the primitive paths in molecular dynamics (MD) simulations suggests complete tube dilation with the tube diameter on the order of the time-dependent characteristic loop size. A characteristic loop at time t is defined as a ring section that has diffused a distance of its size during time t. We derive dynamic scaling exponents in terms of fractal dimensions of an entangled ring and the underlying primitive path and a parameter characterizing the extent of tube dilation. The results reproduce the predictions of different dynamic models of a single non-concatenated entangled ring. We demonstrate that traditional generalization of single-ring models to multi-ring dynamics is not self-consistent and develop a FLG model with self-consistent multi-ring dynamics and complete tube dilation. Various dynamic scaling exponents predicted by the self-consistent FLG model are consistent with recent computer simulations and experiments. We also perform MD simulations of nanoparticle (NP) diffusion in melts of non-concatenated entangled ring polymers

  19. Free Speech Yearbook: 1972.

    ERIC Educational Resources Information Center

    Tedford, Thomas L., Ed.

    This book is a collection of essays on free speech issues and attitudes, compiled by the Commission on Freedom of Speech of the Speech Communication Association. Four articles focus on freedom of speech in classroom situations as follows: a philosophic view of teaching free speech, effects of a course on free speech on student attitudes,…

  20. RESEARCH ON SPEECH COMMUNICATION. AUTOMATIC SPEECH RECOGNITION.

    DTIC Science & Technology

    SPEECH RECOGNITION, AUTOMATIC), EXPERIMENTAL DATA, THEORY, ENGLISH LANGUAGE, PHONETICS, LINGUISTICS, AIR FORCE RESEARCH, FEASIBILITY STUDIES, ACOUSTICS, VOCABULARY, SPEECH REPRESENTATION, WORD ASSOCIATION

  1. Speech analyzer

    NASA Technical Reports Server (NTRS)

    Lokerson, D. C. (Inventor)

    1977-01-01

    A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.

  2. Speech Research

    NASA Astrophysics Data System (ADS)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  3. BoD services in layer 1 VPN with dynamic virtual concatenation group

    NASA Astrophysics Data System (ADS)

    Du, Shu; Peng, Yunfeng; Long, Keping

    2008-11-01

    Bandwidth-on-Demand (BoD) services are characteristic of dynamic bandwidth provisioning based on customers' resource requirement, which will be a must for future networks. BoD services become possible with the development of make-before-break, Virtual Concatenation (VCAT) and Link Capacity Adjustment Scheme (LCAS). In this paper, we introduce BoD services into L1VPN, thus the resource assigned to a L1VPN can be gracefully adjusted at various bandwidth granularities based on customers' requirement. And we propose a dynamic bandwidth adjustment scheme, which is compromise between make-before-break and VCAT&LCAS and mainly based on the latter. The scheme minimizes the number of distinct paths to support a connection between a source-destination pair, and uses make-beforebreak technology for re-optimization.

  4. A concatenation scheme of LDPC codes and source codes for flash memories

    NASA Astrophysics Data System (ADS)

    Huang, Qin; Pan, Song; Zhang, Mu; Wang, Zulin

    2012-12-01

    Recently, low-density parity-check (LDPC) codes have been applied in flash memories to correct errors. However, as verified in this article, their performance degrades rapidly as the number of stuck cells increases. Thus, this paper presents a concatenation reliability scheme of LDPC codes and source codes, which aims to improve the performance of LDPC codes for flash memories with stuck cells. In this scheme, the locations of stuck cells is recorded by source codes in the write process such that erasures rather than wrong log-likelihood ratios on these cells are given in the read process. Then, LDPC codes correct these erasures and soft errors caused by cell-to-cell interferences. The analyses of channel capacity and compression rates of source codes with side information show that the memory cost of the proposed scheme is moderately low. Simulation results verify that the proposed scheme outperforms the traditional scheme with only LDPC codes.

  5. Electronic Entanglement Concentration for the Concatenated Greenberger-Horne-Zeilinger State

    NASA Astrophysics Data System (ADS)

    Ding, Shang-Ping; Zhou, Lan; Gu, Shi-Pu; Wang, Xing-Fu; Sheng, Yu-Bo

    2017-06-01

    Concatenated Greenberger-Horne-Zeilinger (C-GHZ) state, which encodes many physical qubits in a logic qubit will have important applications in both quantum communication and computation. In this paper, we will describe an entanglement concentration protocol (ECP) for electronic C-GHZ state, by exploiting the electronic polarization beam splitters (PBSs) and charge detection. This protocol has several advantages. First, the parties do not need to know the exact coefficients of the initial less-entangled C-GHZ state, which makes this protocol feasible. Second, with the help of charge detection, the distilled maximally entangled C-GHZ state can be remained for future application. Third, this protocol can be repeated to obtain a higher success probability. We hope that this protocol can be useful in future quantum computation based on electrons.

  6. Structure of Concatenated HAMP Domains Provides a Mechanism for Signal Transduction

    SciTech Connect

    Airola, Michael V.; Watts, Kylie J.; Bilwes, Alexandrine M.; Crane, Brian R.

    2010-08-23

    HAMP domains are widespread prokaryotic signaling modules found as single domains or poly-HAMP chains in both transmembrane and soluble proteins. The crystal structure of a three-unit poly-HAMP chain from the Pseudomonas aeruginosa soluble receptor Aer2 defines a universal parallel four-helix bundle architecture for diverse HAMP domains. Two contiguous domains integrate to form a concatenated di-HAMP structure. The three HAMP domains display two distinct conformations that differ by changes in helical register, crossing angle, and rotation. These conformations are stabilized by different subsets of conserved residues. Known signals delivered to HAMP would be expected to switch the relative stability of the two conformations and the position of a coiled-coil phase stutter at the junction with downstream helices. We propose that the two conformations represent opposing HAMP signaling states and suggest a signaling mechanism whereby HAMP domains interconvert between the two states, which alternate down a poly-HAMP chain.

  7. Multidimensional Trellis Coded Phase Modulation Using a Multilevel Concatenation Approach. Part 1; Code Design

    NASA Technical Reports Server (NTRS)

    Rajpal, Sandeep; Rhee, Do Jun; Lin, Shu

    1997-01-01

    The first part of this paper presents a simple and systematic technique for constructing multidimensional M-ary phase shift keying (MMK) trellis coded modulation (TCM) codes. The construction is based on a multilevel concatenation approach in which binary convolutional codes with good free branch distances are used as the outer codes and block MPSK modulation codes are used as the inner codes (or the signal spaces). Conditions on phase invariance of these codes are derived and a multistage decoding scheme for these codes is proposed. The proposed technique can be used to construct good codes for both the additive white Gaussian noise (AWGN) and fading channels as is shown in the second part of this paper.

  8. Generation of an arbitrary concatenated Greenberger-Horne-Zeilinger state with single photons

    NASA Astrophysics Data System (ADS)

    Chen, Shan-Shan; Zhou, Lan; Sheng, Yu-Bo

    2017-02-01

    The concatenated Greenberger-Horne-Zeilinger (C-GHZ) state is a new kind of logic-qubit entangled state, which may have extensive applications in future quantum communication. In this letter, we propose a protocol for constructing an arbitrary C-GHZ state with single photons. We exploit the cross-Kerr nonlinearity for this purpose. This protocol has some advantages over previous protocols. First, it only requires two kinds of cross-Kerr nonlinearities to generate single phase shifts  ±θ. Second, it is not necessary to use sophisticated m-photon Toffoli gates. Third, this protocol is deterministic and can be used to generate an arbitrary C-GHZ state. This protocol may be useful in future quantum information processing based on the C-GHZ state.

  9. Miniaturized MMZI concatenated FLM for gain equalization of ASE response of an EDFA

    NASA Astrophysics Data System (ADS)

    Ramachandran, K.; Kumar, Naveen; Kim, Daseuk

    2017-07-01

    A miniaturized micro Mach-Zehnder interferometer (MMZI) has been employed, for the first time, for gain equalization of amplified spontaneous emission spectrum of an erbium doped fiber. An all-fiber MMZI with interference length of ∼1.5 cm, free spectral range of 40 nm, extinction ratio of 8 dB, and the total length of ∼5 cm, was indigenously realized. The interferometer was concatenated inside/outside the fiber loop mirror, with a built-in polarization controller. By appropriately adjusting the retardance and orientation angle of the polarization controller, the amplified spontaneous emission response of an erbium doped fiber has been flattened over the wavelength range of 35 nm with a peak to peak difference of ±0.46 dB.

  10. Type of Speech Material Affects Acceptable Noise Level Test Outcome

    PubMed Central

    Koch, Xaver; Dingemanse, Gertjan; Goedegebure, André; Janse, Esther

    2016-01-01

    The acceptable noise level (ANL) test, in which individuals indicate what level of noise they are willing to put up with while following speech, has been used to guide hearing aid fitting decisions and has been found to relate to prospective hearing aid use. Unlike objective measures of speech perception ability, ANL outcome is not related to individual hearing loss or age, but rather reflects an individual’s inherent acceptance of competing noise while listening to speech. As such, the measure may predict aspects of hearing aid success. Crucially, however, recent studies have questioned its repeatability (test–retest reliability). The first question for this study was whether the inconsistent results regarding the repeatability of the ANL test may be due to differences in speech material types used in previous studies. Second, it is unclear whether meaningfulness and semantic coherence of the speech modify ANL outcome. To investigate these questions, we compared ANLs obtained with three types of materials: the International Speech Test Signal (ISTS), which is non-meaningful and semantically non-coherent by definition, passages consisting of concatenated meaningful standard audiology sentences, and longer fragments taken from conversational speech. We included conversational speech as this type of speech material is most representative of everyday listening. Additionally, we investigated whether ANL outcomes, obtained with these three different speech materials, were associated with self-reported limitations due to hearing problems and listening effort in everyday life, as assessed by a questionnaire. ANL data were collected for 57 relatively good-hearing adult participants with an age range representative for hearing aid users. Results showed that meaningfulness, but not semantic coherence of the speech material affected ANL. Less noise was accepted for the non-meaningful ISTS signal than for the meaningful speech materials. ANL repeatability was comparable

  11. Static and Dynamic Features for Improved HMM based Visual Speech Recognition

    NASA Astrophysics Data System (ADS)

    Rajavel, R.; Sathidevi, P. S.

    Visual speech recognition refers to the identification of utterances through the movements of lips, tongue, teeth, and other facial muscles of the speaker without using the acoustic signal. This work shows the relative benefits of both static and dynamic visual speech features for improved visual speech recognition. Two approaches for visual feature extraction have been considered: (1) an image transform based static feature approach in which Discrete Cosine Transform (DCT) is applied to each video frame and 6×6 triangle region coefficients are considered as features. Principal Component Analysis (PCA) is applied over all 60 features corresponding to the video frame to reduce the redundancy; the resultant 21 coefficients are taken as the static visual features. (2) Motion segmentation based dynamic feature approach in which the facial movements are segmented from the video file using motion history images (MHI). DCT is applied to the MHI and triangle region coefficients are taken as the dynamic visual features. Two types of experiments were done one with concatenated features and another with dimension reduced feature by using PCA to identify the utterances. The left-right continuous HMMs are used as visual speech classifier to classify nine MPEG-4 standard viseme consonants. The experimental result shows that the concatenated as well as dimension reduced features improve te visual speech recognition with a high accuracy of 92.45% and 92.15% respectively.

  12. Inter-Calibration and Concatenation of Climate Quality Infrared Cloudy Radiances from Multiple Instruments

    NASA Technical Reports Server (NTRS)

    Behrangi, Ali; Aumann, Hartmut H.

    2013-01-01

    A change in climate is not likely captured from any single instrument, since no single instrument can span decades of time. Therefore, to detect signals of global climate change, observations from many instruments on different platforms have to be concatenated. This requires careful and detailed consideration of instrumental differences such as footprint size, diurnal cycle of observations, and relative biases in the spectral brightness temperatures. Furthermore, a common basic assumption is that the data quality is independent of the observed scene and therefore can be determined using clear scene data. However, as will be demonstrated, this is not necessarily a valid assumption as the globe is mostly cloudy. In this study we highlight challenges in inter-calibration and concatenation of infrared radiances from multiple instruments by focusing on the analysis of deep convective or anvil clouds. TRMM/VIRS is potentially useful instrument to make correction for observational differences in the local time and foot print sizes, and thus could be applied retroactively to vintage instruments such as AIRS, IASI, IRIS, AVHRR, and HIRS. As the first step, in this study, we investigate and discuss to what extent AIRS and VIRS agree in capturing deep cloudy radiances at the same local time. The analysis also includes comparisons with one year observations from CrIS. It was found that the instruments show calibration differences of about 1K under deep cloudy scenes that can vary as a function of land type and local time of observation. The sensitivity of footprint size, view angle, and spectral band-pass differences cannot fully explain the observed differences. The observed discrepancies can be considered as a measure of the magnitude of issues which will arise in the comparison of legacy data with current data.

  13. Fast, accurate and easy-to-teach QT interval assessment: The triplicate concatenation method.

    PubMed

    Saqué, Valentin; Vaglio, Martino; Funck-Brentano, Christian; Kilani, Maya; Bourron, Olivier; Hartemann, Agnès; Badilini, Fabio; Salem, Joe-Elie

    The gold standard method for assessing the QTcF (QT corrected for heart rate by Fridericia's cube root formula) interval is the "QTcF semiautomated triplicate averaging method" (TAM), which consists of measuring three QTcF values semiautomatically, for each 10-second sequence of a triplicate electrocardiogram set, and averaging them to get a global and unique QTcF value. Thus, TAM is time consuming. We have developed a new method, namely the "QTcF semiautomated triplicate concatenation method" (TCM), which consists of concatenating the three 10-second sequences of the triplicate electrocardiogram set as if they were a single 30-second electrocardiogram, and measuring QTcF only once for the triplicate electrocardiogram set. To compare the TCM method with the TAM method. Fifty triplicate electrocardiograms were read twice by an expert and a student using both methods (TAM and TCM). We plotted Bland-Altman plots to assess agreement between the two methods, and to compare the student and expert results. The time needed to read a set of 20 consecutive triplicate electrocardiograms was measured. Limits of agreement between TAM and TCM ranged from -8.25 to 6.75ms with the expert reader. TCM was twice as fast as TAM (17.38 versus 34.28min for 20 consecutive triplicate electrocardiograms). Bland-Altman plots comparing student and expert results showed limits of agreement ranging from -4.34 to 11.75ms for TAM, and -1.2 to 8.0ms for TCM. TAM and TCM show good agreement for QT measurement. TCM is less time consuming than TAM. After a learning session, an inexperienced reader can measure the QT interval accurately with both methods. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  14. Inter-Calibration and Concatenation of Climate Quality Infrared Cloudy Radiances from Multiple Instruments

    NASA Technical Reports Server (NTRS)

    Behrangi, Ali; Aumann, Hartmut H.

    2013-01-01

    A change in climate is not likely captured from any single instrument, since no single instrument can span decades of time. Therefore, to detect signals of global climate change, observations from many instruments on different platforms have to be concatenated. This requires careful and detailed consideration of instrumental differences such as footprint size, diurnal cycle of observations, and relative biases in the spectral brightness temperatures. Furthermore, a common basic assumption is that the data quality is independent of the observed scene and therefore can be determined using clear scene data. However, as will be demonstrated, this is not necessarily a valid assumption as the globe is mostly cloudy. In this study we highlight challenges in inter-calibration and concatenation of infrared radiances from multiple instruments by focusing on the analysis of deep convective or anvil clouds. TRMM/VIRS is potentially useful instrument to make correction for observational differences in the local time and foot print sizes, and thus could be applied retroactively to vintage instruments such as AIRS, IASI, IRIS, AVHRR, and HIRS. As the first step, in this study, we investigate and discuss to what extent AIRS and VIRS agree in capturing deep cloudy radiances at the same local time. The analysis also includes comparisons with one year observations from CrIS. It was found that the instruments show calibration differences of about 1K under deep cloudy scenes that can vary as a function of land type and local time of observation. The sensitivity of footprint size, view angle, and spectral band-pass differences cannot fully explain the observed differences. The observed discrepancies can be considered as a measure of the magnitude of issues which will arise in the comparison of legacy data with current data.

  15. Phrase-level speech simulation with an airway modulation model of speech production

    PubMed Central

    Story, Brad H.

    2012-01-01

    Artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated. PMID:23503742

  16. Insufficient Chunk Concatenation May Underlie Changes in Sleep-Dependent Consolidation of Motor Sequence Learning in Older Adults

    ERIC Educational Resources Information Center

    Bottary, Ryan; Sonni, Akshata; Wright, David; Spencer, Rebecca M. C.

    2016-01-01

    Sleep enhances motor sequence learning (MSL) in young adults by concatenating subsequences ("chunks") formed during skill acquisition. To examine whether this process is reduced in aging, we assessed performance changes on the MSL task following overnight sleep or daytime wake in healthy young and older adults. Young adult performance…

  17. Insufficient Chunk Concatenation May Underlie Changes in Sleep-Dependent Consolidation of Motor Sequence Learning in Older Adults

    ERIC Educational Resources Information Center

    Bottary, Ryan; Sonni, Akshata; Wright, David; Spencer, Rebecca M. C.

    2016-01-01

    Sleep enhances motor sequence learning (MSL) in young adults by concatenating subsequences ("chunks") formed during skill acquisition. To examine whether this process is reduced in aging, we assessed performance changes on the MSL task following overnight sleep or daytime wake in healthy young and older adults. Young adult performance…

  18. Stress versus coarticulation: toward an integrated approach to explicit speech segmentation.

    PubMed

    Mattys, Sven L

    2004-04-01

    Although word stress has been hailed as a powerful speech-segmentation cue, the results of 5 cross-modal fragment priming experiments revealed limitations to stress-based segmentation. Specifically, the stress pattern of auditory primes failed to have any effect on the lexical decision latencies to related visual targets. A determining factor was whether the onset of the prime was coarticulated with the preceding speech fragment. Uncoarticulated (i.e., concatenated) primes facilitated priming. Coarticulated ones did not. However, when the primes were presented in a background of noise, the pattern of results reversed, and a strong stress effect emerged: Stress-initial primes caused more pruning than non-initial-stress primes, regardless of the coarticulatory cues. The results underscore the role of coarticulation in the segmentation of clear speech and that of stress in impoverished listening conditions. More generally, they call for an integrated and signal-contingent approach to speech segmentation.

  19. Keynote Speeches.

    ERIC Educational Resources Information Center

    2000

    This document contains the six of the seven keynote speeches from an international conference on vocational education and training (VET) for lifelong learning in the information era. "IVETA (International Vocational Education and Training Association) 2000 Conference 6-9 August 2000" (K.Y. Yeung) discusses the objectives and activities…

  20. Speech Intelligibility

    NASA Astrophysics Data System (ADS)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  1. Chunk concatenation evolves with practice and sleep-related enhancement consolidation in a complex arm movement sequence

    PubMed Central

    Malangré, Andreas

    2016-01-01

    Abstract This paper addresses the notion of chunk concatenation being associated with sleep-related enhancement consolidation of motor sequence memory, thereby essentially contributing to improvements in sequence execution speed. To this end, element movement times of a multi-joint arm movement sequence incorporated in a recent study by Malangré et al. (2014) were reanalyzed. As sequence elements differed with respect to movement distance, element movement times had to be purged from differences solely due to varying trajectory lengths. This was done by dividing each element movement time per subject and trial block by the respective “reference movement time” collected from subjects who had extensively practiced each sequence element in isolation. Any differences in these “relative element movement times” were supposed to reflect element-specific “production costs” imposed solely by the sequence context. Across all subjects non-idiosyncratic, lasting sequence segmentation was shown, and four possible concatenation points (i.e. transition points between successive chunks) within the original arm movement sequence were identified. Based on theoretical suppositions derived from previous work with the discrete sequence production task and the dual processor model (Abrahamse et al., 2013), significantly larger improvements in transition speed occurring at these four concatenation points as compared to the five fastest transition positions within the sequence (associated with mere element execution) were assumed to indicate increased chunk concatenation. As a result, chunk concatenation was shown to proceed during acquisition with physical practice, and, most importantly, to significantly progress some more during retention following a night of sleep, but not during a waking interval. PMID:28149363

  2. Speech production knowledge in automatic speech recognition.

    PubMed

    King, Simon; Frankel, Joe; Livescu, Karen; McDermott, Erik; Richmond, Korin; Wester, Mirjam

    2007-02-01

    Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, a survey of a growing body of work in which such representations are used to improve automatic speech recognition is provided.

  3. Speech transformations based on a sinusoidal representation

    NASA Astrophysics Data System (ADS)

    Quatieri, T. E.; McAulay, R. J.

    1986-05-01

    A new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformation including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency and rate of articulation. The method is based on a sinusoidal representation of the speech production mechanism that has been shown to produce synthetic speech that preserves the waveform shape and is essentially perceptually indistinguishable from the original. Although the analysis/synthesis system originally was designed for single-speaker signals, it is equally capable of recovering and modifying nonspeech signals such as music; multiple speakers, marine biologic sounds, and speakers in the presence of interferences such as noise and musical backgrounds.

  4. The Effect of Three Variables on Synthetic Speech Intelligibility in Noisy Environments

    DTIC Science & Technology

    1990-03-01

    this study: the analog formant frequency synthesis technique. A second definition of "synthetic" speech is related to basic data sampling theory...Analog formant frequency synthesis is a typical synthetic speech methodology, used here as an illustration of the technique. The waveform encoding and...reconstruction technique (discussed above) is similar to a "photograph" of speech. Analog formant frequency synthesis is more like an artist’s

  5. Speech communications in noise

    NASA Technical Reports Server (NTRS)

    1984-01-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  6. Speech communications in noise

    NASA Astrophysics Data System (ADS)

    1984-07-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  7. Speech communications in noise

    NASA Technical Reports Server (NTRS)

    1984-01-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  8. Segmental concatenation of individual signatures and context cues in banded mongoose (Mungos mungo) close calls

    PubMed Central

    2012-01-01

    Background All animals are anatomically constrained in the number of discrete call types they can produce. Recent studies suggest that by combining existing calls into meaningful sequences, animals can increase the information content of their vocal repertoire despite these constraints. Additionally, signalers can use vocal signatures or cues correlated to other individual traits or contexts to increase the information encoded in their vocalizations. However, encoding multiple vocal signatures or cues using the same components of vocalizations usually reduces the signals' reliability. Segregation of information could effectively circumvent this trade-off. In this study we investigate how banded mongooses (Mungos mungo) encode multiple vocal signatures or cues in their frequently emitted graded single syllable close calls. Results The data for this study were collected on a wild, but habituated, population of banded mongooses. Using behavioral observations and acoustical analysis we found that close calls contain two acoustically different segments. The first being stable and individually distinct, and the second being graded and correlating with the current behavior of the individual, whether it is digging, searching or moving. This provides evidence of Marler's hypothesis on temporal segregation of information within a single syllable call type. Additionally, our work represents an example of an identity cue integrated as a discrete segment within a single call that is independent from context. This likely functions to avoid ambiguity between individuals or receivers having to keep track of several context-specific identity cues. Conclusions Our study provides the first evidence of segmental concatenation of information within a single syllable in non-human vocalizations. By reviewing descriptions of call structures in the literature, we suggest a general application of this mechanism. Our study indicates that temporal segregation and segmental concatenation of

  9. Segmental concatenation of individual signatures and context cues in banded mongoose (Mungos mungo) close calls.

    PubMed

    Jansen, David A W A M; Cant, Michael A; Manser, Marta B

    2012-12-03

    All animals are anatomically constrained in the number of discrete call types they can produce. Recent studies suggest that by combining existing calls into meaningful sequences, animals can increase the information content of their vocal repertoire despite these constraints. Additionally, signalers can use vocal signatures or cues correlated to other individual traits or contexts to increase the information encoded in their vocalizations. However, encoding multiple vocal signatures or cues using the same components of vocalizations usually reduces the signals' reliability. Segregation of information could effectively circumvent this trade-off. In this study we investigate how banded mongooses (Mungos mungo) encode multiple vocal signatures or cues in their frequently emitted graded single syllable close calls. The data for this study were collected on a wild, but habituated, population of banded mongooses. Using behavioral observations and acoustical analysis we found that close calls contain two acoustically different segments. The first being stable and individually distinct, and the second being graded and correlating with the current behavior of the individual, whether it is digging, searching or moving. This provides evidence of Marler's hypothesis on temporal segregation of information within a single syllable call type. Additionally, our work represents an example of an identity cue integrated as a discrete segment within a single call that is independent from context. This likely functions to avoid ambiguity between individuals or receivers having to keep track of several context-specific identity cues. Our study provides the first evidence of segmental concatenation of information within a single syllable in non-human vocalizations. By reviewing descriptions of call structures in the literature, we suggest a general application of this mechanism. Our study indicates that temporal segregation and segmental concatenation of vocal signatures or cues is

  10. Analysis, recognition, and interpretation of speech signals

    NASA Astrophysics Data System (ADS)

    Vintziuk, Taras Klimovich

    The problems of the machine analysis, recognition, semantic interpretation, synthesis, and compressed speech transmission are examined with reference to oral man-machine dialogue in formalized and natural languages for applications in data collection, processing, and control systems. Methods for the recognition of individual words and continuous speech, signal segmentation and self-segmentation, speech recognition learning, recognition of the voice of a particular operator, recognition of multiple speakers, and selection of signal matching and signal analysis techniques are discussed from a unified standpoint based on the use of dynamic programming.

  11. Speech Research.

    DTIC Science & Technology

    1979-12-31

    Academic Press, 1973. Kimura, D. The neural basis of language qua gesture. In H. Whitaker & H. A. Whitaker (Eds.), Studies in neurolinguistics (Vol. 3...Lubker, J., & Gay, T. Formant frequencies of some fixed- mandible vowels and a model of speech motor programming . Journal of Phonetics, 1979, 7, 147-162...A. Interarticulator programming in stop production. To appear in Journal of Phonetics, in press. Ldfqvist, A., & Yoshioka, H. Laryngeal activity in

  12. Speech enhancement via two-stage dual tree complex wavelet packet transform with a speech presence probability estimator.

    PubMed

    Sun, Pengfei; Qin, Jun

    2017-02-01

    In this paper, a two-stage dual tree complex wavelet packet transform (DTCWPT) based speech enhancement algorithm has been proposed, in which a speech presence probability (SPP) estimator and a generalized minimum mean squared error (MMSE) estimator are developed. To overcome the drawback of signal distortions caused by down sampling of wavelet packet transform (WPT), a two-stage analytic decomposition concatenating undecimated wavelet packet transform (UWPT) and decimated WPT is employed. An SPP estimator in the DTCWPT domain is derived based on a generalized Gamma distribution of speech, and Gaussian noise assumption. The validation results show that the proposed algorithm can obtain enhanced perceptual evaluation of speech quality (PESQ), and segmental signal-to-noise ratio (SegSNR) at low signal-to-noise ratio (SNR) nonstationary noise, compared with four other state-of-the-art speech enhancement algorithms, including optimally modified log-spectral amplitude (OM-LSA), soft masking using a posteriori SNR uncertainty (SMPO), a posteriori SPP based MMSE estimation (MMSE-SPP), and adaptive Bayesian wavelet thresholding (BWT).

  13. Testing the performance of feedback concatenated decoder with a nonideal receiver

    NASA Technical Reports Server (NTRS)

    Feria, Y.; Dolinar, S.

    1995-01-01

    One of the inherent problems in testing the feedback concatenated decoder (FCD) at our operating symbol signal-to-noise ratio (SSNR) is that the bit-error rate is so low that we cannot measure it directly through simulations in a reasonable time period. This article proposes a test procedure that will give a reasonable estimate of the expected losses even though the number of frames tested is much smaller than needed for a direct measurement. This test procedure provides an organized robust methodology for extrapolating small amounts of test data to give reasonable estimates of FCD loss increments at unmeasurable miniscule error rates. Using this test procedure, we have run some preliminary tests on the FCD to quantify the losses due to the fact that the input signal contains multiplicative non-white non-Gaussian noises resulting from the buffered telemetry demodulator (BTD). Besides the losses in the BTD, we have observed additional loss increments of 0.3 to 0.4 dB at the output of the FCD for several test cases with loop signal-to-noise ratios (SNR's) lower than 20 dB. In contrast, these loss increments were less than 0.1 dB for a test case with the subcarrier loop SNR at about 28 dB. This test procedure can be applied to more extensive test data to determine thresholds on the loop SNRs above which the FCD will not suffer substantial loss increments.

  14. Hyperbranched Hybridization Chain Reaction for Triggered Signal Amplification and Concatenated Logic Circuits.

    PubMed

    Bi, Sai; Chen, Min; Jia, Xiaoqiang; Dong, Ying; Wang, Zonghua

    2015-07-06

    A hyper-branched hybridization chain reaction (HB-HCR) is presented herein, which consists of only six species that can metastably coexist until the introduction of an initiator DNA to trigger a cascade of hybridization events, leading to the self-sustained assembly of hyper-branched and nicked double-stranded DNA structures. The system can readily achieve ultrasensitive detection of target DNA. Moreover, the HB-HCR principle is successfully applied to construct three-input concatenated logic circuits with excellent specificity and extended to design a security-mimicking keypad lock system. Significantly, the HB-HCR-based keypad lock can alarm immediately if the "password" is incorrect. Overall, the proposed HB-HCR with high amplification efficiency is simple, homogeneous, fast, robust, and low-cost, and holds great promise in the development of biosensing, in the programmable assembly of DNA architectures, and in molecular logic operations. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. On the undetected error probability of a concatenated coding scheme for error control

    NASA Technical Reports Server (NTRS)

    Deng, H.; Costello, D. J., Jr.

    1984-01-01

    Consider a concatenated coding scheme for error control on a binary symmetric channel, called the inner channel. The bit error rate (BER) of the channel is correspondingly called the inner BER, and is denoted by Epsilon (sub i). Two linear block codes, C(sub f) and C(sub b), are used. The inner code C(sub f), called the frame code, is an (n,k) systematic binary block code with minimum distance, d(sub f). The frame code is designed to correct + or fewer errors and simultaneously detect gamma (gamma +) or fewer errors, where + + gamma + 1 = to or d(sub f). The outer code C(sub b) is either an (n(sub b), K(sub b)) binary block with a n(sub b) = mk, or an (n(sub b), k(Sub b) maximum distance separable (MDS) code with symbols from GF(q), where q = 2(b) and the code length n(sub b) satisfies n(sub)(b) = mk. The integerim is the number of frames. The outercode is designed for error detection only.

  16. Period Concatenation Underlies Interactions between Gamma and Beta Rhythms in Neocortex

    PubMed Central

    Roopun, Anita K.; Kramer, Mark A.; Carracedo, Lucy M.; Kaiser, Marcus; Davies, Ceri H.; Traub, Roger D.; Kopell, Nancy J.; Whittington, Miles A.

    2008-01-01

    The neocortex generates rhythmic electrical activity over a frequency range covering many decades. Specific cognitive and motor states are associated with oscillations in discrete frequency bands within this range, but it is not known whether interactions and transitions between distinct frequencies are of functional importance. When coexpressed rhythms have frequencies that differ by a factor of two or more interactions can be seen in terms of phase synchronization. Larger frequency differences can result in interactions in the form of nesting of faster frequencies within slower ones by a process of amplitude modulation. It is not known how coexpressed rhythms, whose frequencies differ by less than a factor of two may interact. Here we show that two frequencies (gamma – 40 Hz and beta2 – 25 Hz), coexpressed in superficial and deep cortical laminae with low temporal interaction, can combine to generate a third frequency (beta1 – 15 Hz) showing strong temporal interaction. The process occurs via period concatenation, with basic rhythm-generating microcircuits underlying gamma and beta2 rhythms forming the building blocks of the beta1 rhythm by a process of addition. The mean ratio of adjacent frequency components was a constant – approximately the golden mean – which served to both minimize temporal interactions, and permit multiple transitions, between frequencies. The resulting temporal landscape may provide a framework for multiplexing – parallel information processing on multiple temporal scales. PMID:18946516

  17. Speech Recognition by Computer.

    ERIC Educational Resources Information Center

    Levinson, Stephen E.; Liberman, Mark Y.

    1981-01-01

    Speech recognition by computers is discussed, including methods of recognizing isolated words and procedures for analyzing connected speech. Describes Bell Laboratories' speech recognition system which attempts to combine major elements of human communication into a single operating unit. (DS)

  18. Speech disorders - children

    MedlinePlus

    ... disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... a person repeats a sound, word, or phrase. Stuttering may be the most serious disfluency. Articulation disorders ...

  19. Research on Speech Perception. Progress Report No. 13.

    ERIC Educational Resources Information Center

    Pisoni, David B.; And Others

    Summarizing research activities in 1987, this is the thirteenth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, progress reports, and information on…

  20. Research on Speech Perception. Progress Report No. 15.

    ERIC Educational Resources Information Center

    Pisoni, David B.

    Summarizing research activities in 1989, this is the fifteenth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report contains the following 21 articles: "Perceptual Learning of Nonnative Speech…

  1. Analysis of a Digital Technique for Frequency Transposition of Speech.

    DTIC Science & Technology

    1985-09-01

    31 5. Spectral Analysis ------------------------ 36 ZI a. Formant Frequencies ------------------ 36 j 5 .-h C. SPEECH SYNTHESIS...necessary to begin the generation of sound. The vocal cords, tongue , mouth, lips and nasal tract combine their different properties to shape the airflow...convenient way to portray the frequency content of speech is through the determination of formant frequencies. Formant frequencies are the most prominent

  2. Research on Speech Perception. Progress Report No. 12.

    ERIC Educational Resources Information Center

    Pisoni, David B.; And Others

    Summarizing research activities in 1986, this is the twelfth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report contains the following 23 articles: "Comprehension of Digitally Encoded Natural Speech…

  3. Research on Speech Perception. Progress Report No. 14.

    ERIC Educational Resources Information Center

    Pisoni, David B.; And Others

    Summarizing research activities in 1988, this is the fourteenth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, and progress reports. The report contains…

  4. Data concatenation, Bayesian concordance and coalescent-based analyses of the species tree for the rapid radiation of Triturus newts.

    PubMed

    Wielstra, Ben; Arntzen, Jan W; van der Gaag, Kristiaan J; Pabijan, Maciej; Babik, Wieslaw

    2014-01-01

    The phylogenetic relationships for rapid species radiations are difficult to disentangle. Here we study one such case, namely the genus Triturus, which is composed of the marbled and crested newts. We analyze data for 38 genetic markers, positioned in 3-prime untranslated regions of protein-coding genes, obtained with 454 sequencing. Our dataset includes twenty Triturus newts and represents all nine species. Bayesian analysis of population structure allocates all individuals to their respective species. The branching patterns obtained by data concatenation, Bayesian concordance analysis and coalescent-based estimations of the species tree differ from one another. The data concatenation based species tree shows high branch support but branching order is considerably affected by allele choice in the case of heterozygotes in the concatenation process. Bayesian concordance analysis expresses the conflict between individual gene trees for part of the Triturus species tree as low concordance factors. The coalescent-based species tree is relatively similar to a previously published species tree based upon morphology and full mtDNA and any conflicting internal branches are not highly supported. Our findings reflect high gene tree discordance due to incomplete lineage sorting (possibly aggravated by hybridization) in combination with low information content of the markers employed (as can be expected for relatively recent species radiations). This case study highlights the complexity of resolving rapid radiations and we acknowledge that to convincingly resolve the Triturus species tree even more genes will have to be consulted.

  5. High-speed concatenation of frequency ramps using sampled grating distributed Bragg reflector laser diode sources for OCT resolution enhancement

    NASA Astrophysics Data System (ADS)

    George, Brandon; Derickson, Dennis

    2010-02-01

    Wavelength tunable sampled grating distributed Bragg reflector (SG-DBR) lasers used for telecommunications applications have previously demonstrated the ability for linear frequency ramps covering the entire tuning range of the laser at 100 kHz repetition rates1. An individual SG-DBR laser has a typical tuning range of 50 nm. The InGaAs/InP material system often used with SG-DBR lasers allows for design variations that cover the 1250 to 1650 nm wavelength range. This paper addresses the possibility of concatenating the outputs of tunable SGDBR lasers covering adjacent wavelength ranges for enhancing the resolution of OCT measurements. This laser concatenation method is demonstrated by combining the 1525 nm to 1575 nm wavelength range of a "C Band" SG-DBR laser with the 1570nm to 1620 nm wavelength coverage of an "L-Band" SG-DBR laser. Measurements show that SGDBR lasers can be concatenated with a transition switching time of less than 50 ns with undesired leakage signals attenuated by 50 dB.

  6. Expansion and concatenation of nonmuscle myosin IIA filaments drive cellular contractile system formation during interphase and mitosis

    PubMed Central

    Fenix, Aidan M.; Taneja, Nilay; Buttler, Carmen A.; Lewis, John; Van Engelenburg, Schuyler B.; Ohi, Ryoma; Burnette, Dylan T.

    2016-01-01

    Cell movement and cytokinesis are facilitated by contractile forces generated by the molecular motor, nonmuscle myosin II (NMII). NMII molecules form a filament (NMII-F) through interactions of their C-terminal rod domains, positioning groups of N-terminal motor domains on opposite sides. The NMII motors then bind and pull actin filaments toward the NMII-F, thus driving contraction. Inside of crawling cells, NMIIA-Fs form large macromolecular ensembles (i.e., NMIIA-F stacks), but how this occurs is unknown. Here we show NMIIA-F stacks are formed through two non–mutually exclusive mechanisms: expansion and concatenation. During expansion, NMIIA molecules within the NMIIA-F spread out concurrent with addition of new NMIIA molecules. Concatenation occurs when multiple NMIIA-Fs/NMIIA-F stacks move together and align. We found that NMIIA-F stack formation was regulated by both motor activity and the availability of surrounding actin filaments. Furthermore, our data showed expansion and concatenation also formed the contractile ring in dividing cells. Thus interphase and mitotic cells share similar mechanisms for creating large contractile units, and these are likely to underlie how other myosin II–based contractile systems are assembled. PMID:26960797

  7. Speech research

    NASA Astrophysics Data System (ADS)

    1992-06-01

    Phonology is traditionally seen as the discipline that concerns itself with the building blocks of linguistic messages. It is the study of the structure of sound inventories of languages and of the participation of sounds in rules or processes. Phonetics, in contrast, concerns speech sounds as produced and perceived. Two extreme positions on the relationship between phonological messages and phonetic realizations are represented in the literature. One holds that the primary home for linguistic symbols, including phonological ones, is the human mind, itself housed in the human brain. The second holds that their primary home is the human vocal tract.

  8. Speech recognition and understanding

    SciTech Connect

    Vintsyuk, T.K.

    1983-05-01

    This article discusses the automatic processing of speech signals with the aim of finding a sequence of works (speech recognition) or a concept (speech understanding) being transmitted by the speech signal. The goal of the research is to develop an automatic typewriter that will automatically edit and type text under voice control. A dynamic programming method is proposed in which all possible class signals are stored, after which the presented signal is compared to all the stored signals during the recognition phase. Topics considered include element-by-element recognition of words of speech, learning speech recognition, phoneme-by-phoneme speech recognition, the recognition of connected speech, understanding connected speech, and prospects for designing speech recognition and understanding systems. An application of the composition dynamic programming method for the solution of basic problems in the recognition and understanding of speech is presented.

  9. Processing of speech signals for physical and sensory disabilities.

    PubMed Central

    Levitt, H

    1995-01-01

    Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities. Images Fig. 4 PMID:7479816

  10. Metallobiological necklaces: mass spectrometric and molecular modeling study of metallation in concatenated domains of metallothionein.

    PubMed

    Chan, Jayna; Huang, Zuyun; Watt, Ian; Kille, Peter; Stillman, Martin

    2008-01-01

    The ubiquitous protein metallothionein (MT) has proven to be a major player not only in the homeostasis of Cu(I) and Zn(II), but also binds all the Group 11 and 12 metals. Metallothioneins are characterised by the presence of numerous cys-x-cys and cys-cys motifs in the sequence and are found naturally with either one domain or two, linked, metal-binding domains. The use of chains of these metal-thiolate domains offers the possibility of creating chemically tuneable and, therefore, chemically dependent electrochemical or photochemical surface modifiers or as nanomachinery with nanomechanical properties. In this work, the metal-binding properties of the Cd(4)-containing domain of alpha-rhMT1a assembled into chains of two and three concatenated domains, that is, "necklaces", have been studied by spectrometric techniques, and the interactions within the structures modelled and interpreted by using molecular dynamics. These chains are metallated with 4, 8 or 12 Cd(II) ions to the 11, 22, and 33 cysteinyl sulfur atoms in the alpha-rhMT1a, alphaalpha-rhMT1a, and alphaalphaalpha-rhMT1a proteins, respectively. The effect of pH on the folding of each protein was studied by ESI-MS and optical spectroscopy. MM3/MD simulations were carried out over a period of up to 500 ps by using force-field parameters based on the reported structural data. These calculations provide novel information about the motion of the clustered metallated, partially demetallated, and metal-free peptide chains, with special interest in the region of the metal-binding site. The MD energy/time trajectory conformations show for the first time the flexibility of the metal-sulfur clusters and the bound amino acid chains. We report unexpected and very different sizes for the metallated and demetallated proteins from the combination of experimental data, with molecular dynamics simulations.

  11. NINJA-OPS: Fast Accurate Marker Gene Alignment Using Concatenated Ribosomes

    PubMed Central

    Al-Ghalith, Gabriel A.; Montassier, Emmanuel; Ward, Henry N.; Knights, Dan

    2016-01-01

    The explosion of bioinformatics technologies in the form of next generation sequencing (NGS) has facilitated a massive influx of genomics data in the form of short reads. Short read mapping is therefore a fundamental component of next generation sequencing pipelines which routinely match these short reads against reference genomes for contig assembly. However, such techniques have seldom been applied to microbial marker gene sequencing studies, which have mostly relied on novel heuristic approaches. We propose NINJA Is Not Just Another OTU-Picking Solution (NINJA-OPS, or NINJA for short), a fast and highly accurate novel method enabling reference-based marker gene matching (picking Operational Taxonomic Units, or OTUs). NINJA takes advantage of the Burrows-Wheeler (BW) alignment using an artificial reference chromosome composed of concatenated reference sequences, the “concatesome,” as the BW input. Other features include automatic support for paired-end reads with arbitrary insert sizes. NINJA is also free and open source and implements several pre-filtering methods that elicit substantial speedup when coupled with existing tools. We applied NINJA to several published microbiome studies, obtaining accuracy similar to or better than previous reference-based OTU-picking methods while achieving an order of magnitude or more speedup and using a fraction of the memory footprint. NINJA is a complete pipeline that takes a FASTA-formatted input file and outputs a QIIME-formatted taxonomy-annotated BIOM file for an entire MiSeq run of human gut microbiome 16S genes in under 10 minutes on a dual-core laptop. PMID:26820746

  12. NINJA-OPS: Fast Accurate Marker Gene Alignment Using Concatenated Ribosomes.

    PubMed

    Al-Ghalith, Gabriel A; Montassier, Emmanuel; Ward, Henry N; Knights, Dan

    2016-01-01

    The explosion of bioinformatics technologies in the form of next generation sequencing (NGS) has facilitated a massive influx of genomics data in the form of short reads. Short read mapping is therefore a fundamental component of next generation sequencing pipelines which routinely match these short reads against reference genomes for contig assembly. However, such techniques have seldom been applied to microbial marker gene sequencing studies, which have mostly relied on novel heuristic approaches. We propose NINJA Is Not Just Another OTU-Picking Solution (NINJA-OPS, or NINJA for short), a fast and highly accurate novel method enabling reference-based marker gene matching (picking Operational Taxonomic Units, or OTUs). NINJA takes advantage of the Burrows-Wheeler (BW) alignment using an artificial reference chromosome composed of concatenated reference sequences, the "concatesome," as the BW input. Other features include automatic support for paired-end reads with arbitrary insert sizes. NINJA is also free and open source and implements several pre-filtering methods that elicit substantial speedup when coupled with existing tools. We applied NINJA to several published microbiome studies, obtaining accuracy similar to or better than previous reference-based OTU-picking methods while achieving an order of magnitude or more speedup and using a fraction of the memory footprint. NINJA is a complete pipeline that takes a FASTA-formatted input file and outputs a QIIME-formatted taxonomy-annotated BIOM file for an entire MiSeq run of human gut microbiome 16S genes in under 10 minutes on a dual-core laptop.

  13. A commercial large-vocabulary discrete speech recognition system: DragonDictate.

    PubMed

    Mandel, M A

    1992-01-01

    DragonDictate is currently the only commercially available general-purpose, large-vocabulary speech recognition system. It uses discrete speech and is speaker-dependent, adapting to the speaker's voice and language model with every word. Its acoustic adaptability is based in a three-level phonology and a stochastic model of production. The phonological levels are phonemes, augmented triphones (phonemes-in-context or PICs), and steady-state spectral slices that are concatenated to approximate the spectra of these PICs (phonetic elements or PELs) and thus of words. Production is treated as a hidden Markov process, which the recognizer has to identify from its output, the spoken word. Findings of practical value to speech recognition are presented from research on six European languages.

  14. Opportunities in Speech Pathology.

    ERIC Educational Resources Information Center

    Newman, Parley W.

    The importance of speech is discussed and speech pathology is described. Types of communication disorders considered are articulation disorders, aphasia, facial deformity, hearing loss, stuttering, delayed speech, voice disorders, and cerebral palsy; examples of five disorders are given. Speech pathology is investigated from these aspects: the…

  15. Careers in Speech Communication.

    ERIC Educational Resources Information Center

    Speech Communication Association, New York, NY.

    Brief discussions in this pamphlet suggest educational and career opportunities in the following fields of speech communication: rhetoric, public address, and communication; theatre, drama, and oral interpretation; radio, television, and film; speech pathology and audiology; speech science, phonetics, and linguistics; and speech education.…

  16. Comparison of Functional Connectivity Estimated from Concatenated Task-State Data from Block-Design Paradigm with That of Continuous Task

    PubMed Central

    Zhu, Yang; Cheng, Lin; He, Naying; Yang, Yang; Ling, Huawei; Tong, Shanbao

    2017-01-01

    Functional connectivity (FC) analysis with data collected as continuous tasks and activation analysis using data from block-design paradigms are two main methods to investigate the task-induced brain activation. If the concatenated data of task blocks extracted from the block-design paradigm could provide equivalent FC information to that derived from continuous task data, it would shorten the data collection time and simplify experimental procedures, and the already collected data of block-design paradigms could be reanalyzed from the perspective of FC. Despite being used in many studies, such a hypothesis of equivalence has not yet been tested from multiple perspectives. In this study, we collected fMRI blood-oxygen-level-dependent signals from 24 healthy subjects during a continuous task session as well as in block-design task sessions. We compared concatenated task blocks and continuous task data in terms of region of interest- (ROI-) based FC, seed-based FC, and brain network topology during a short motor task. According to our results, the concatenated data was not significantly different from the continuous data in multiple aspects, indicating the potential of using concatenated data to estimate task-state FC in short motor tasks. However, even under appropriate experimental conditions, the interpretation of FC results based on concatenated data should be cautious and take the influence due to inherent information loss during concatenation into account. PMID:28191030

  17. Comparison of Functional Connectivity Estimated from Concatenated Task-State Data from Block-Design Paradigm with That of Continuous Task.

    PubMed

    Zhu, Yang; Cheng, Lin; He, Naying; Yang, Yang; Ling, Huawei; Ayaz, Hasan; Tong, Shanbao; Sun, Junfeng; Fu, Yi

    2017-01-01

    Functional connectivity (FC) analysis with data collected as continuous tasks and activation analysis using data from block-design paradigms are two main methods to investigate the task-induced brain activation. If the concatenated data of task blocks extracted from the block-design paradigm could provide equivalent FC information to that derived from continuous task data, it would shorten the data collection time and simplify experimental procedures, and the already collected data of block-design paradigms could be reanalyzed from the perspective of FC. Despite being used in many studies, such a hypothesis of equivalence has not yet been tested from multiple perspectives. In this study, we collected fMRI blood-oxygen-level-dependent signals from 24 healthy subjects during a continuous task session as well as in block-design task sessions. We compared concatenated task blocks and continuous task data in terms of region of interest- (ROI-) based FC, seed-based FC, and brain network topology during a short motor task. According to our results, the concatenated data was not significantly different from the continuous data in multiple aspects, indicating the potential of using concatenated data to estimate task-state FC in short motor tasks. However, even under appropriate experimental conditions, the interpretation of FC results based on concatenated data should be cautious and take the influence due to inherent information loss during concatenation into account.

  18. A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading

    PubMed Central

    Price, Cathy J.

    2012-01-01

    The anatomy of language has been investigated with PET or fMRI for more than 20 years. Here I attempt to provide an overview of the brain areas associated with heard speech, speech production and reading. The conclusions of many hundreds of studies were considered, grouped according to the type of processing, and reported in the order that they were published. Many findings have been replicated time and time again leading to some consistent and undisputable conclusions. These are summarised in an anatomical model that indicates the location of the language areas and the most consistent functions that have been assigned to them. The implications for cognitive models of language processing are also considered. In particular, a distinction can be made between processes that are localized to specific structures (e.g. sensory and motor processing) and processes where specialisation arises in the distributed pattern of activation over many different areas that each participate in multiple functions. For example, phonological processing of heard speech is supported by the functional integration of auditory processing and articulation; and orthographic processing is supported by the functional integration of visual processing, articulation and semantics. Future studies will undoubtedly be able to improve the spatial precision with which functional regions can be dissociated but the greatest challenge will be to understand how different brain regions interact with one another in their attempts to comprehend and produce language. PMID:22584224

  19. Speech therapy with obturator.

    PubMed

    Shyammohan, A; Sreenivasulu, D

    2010-12-01

    Rehabilitation of speech is tantamount to closure of defect in cases with velopharyngeal insufficiency. Often the importance of speech therapy is sidelined during the fabrication of obturators. Usually the speech part is taken up only at a later stage and is relegated entirely to a speech therapist without the active involvement of the prosthodontist. The article suggests a protocol for speech therapy in such cases to be done in unison with a prosthodontist.

  20. RibAlign: a software tool and database for eubacterial phylogeny based on concatenated ribosomal protein subunits.

    PubMed

    Teeling, Hanno; Gloeckner, Frank Oliver

    2006-02-13

    Until today, analysis of 16S ribosomal RNA (rRNA) sequences has been the de-facto gold standard for the assessment of phylogenetic relationships among prokaryotes. However, the branching order of the individual phlya is not well-resolved in 16S rRNA-based trees. In search of an improvement, new phylogenetic methods have been developed alongside with the growing availability of complete genome sequences. Unfortunately, only a few genes in prokaryotic genomes qualify as universal phylogenetic markers and almost all of them have a lower information content than the 16S rRNA gene. Therefore, emphasis has been placed on methods that are based on multiple genes or even entire genomes. The concatenation of ribosomal protein sequences is one method which has been ascribed an improved resolution. Since there is neither a comprehensive database for ribosomal protein sequences nor a tool that assists in sequence retrieval and generation of respective input files for phylogenetic reconstruction programs, RibAlign has been developed to fill this gap. RibAlign serves two purposes: First, it provides a fast and scalable database that has been specifically adapted to eubacterial ribosomal protein sequences and second, it provides sophisticated import and export capabilities. This includes semi-automatic extraction of ribosomal protein sequences from whole-genome GenBank and FASTA files as well as exporting aligned, concatenated and filtered sequence files that can directly be used in conjunction with the PHYLIP and MrBayes phylogenetic reconstruction programs. Up to now, phylogeny based on concatenated ribosomal protein sequences is hampered by the limited set of sequenced genomes and high computational requirements. However, hundreds of full and draft genome sequencing projects are on the way, and advances in cluster-computing and algorithms make phylogenetic reconstructions feasible even with large alignments of concatenated marker genes. RibAlign is a first step in this

  1. RibAlign: a software tool and database for eubacterial phylogeny based on concatenated ribosomal protein subunits

    PubMed Central

    Teeling, Hanno; Gloeckner, Frank Oliver

    2006-01-01

    Background Until today, analysis of 16S ribosomal RNA (rRNA) sequences has been the de-facto gold standard for the assessment of phylogenetic relationships among prokaryotes. However, the branching order of the individual phlya is not well-resolved in 16S rRNA-based trees. In search of an improvement, new phylogenetic methods have been developed alongside with the growing availability of complete genome sequences. Unfortunately, only a few genes in prokaryotic genomes qualify as universal phylogenetic markers and almost all of them have a lower information content than the 16S rRNA gene. Therefore, emphasis has been placed on methods that are based on multiple genes or even entire genomes. The concatenation of ribosomal protein sequences is one method which has been ascribed an improved resolution. Since there is neither a comprehensive database for ribosomal protein sequences nor a tool that assists in sequence retrieval and generation of respective input files for phylogenetic reconstruction programs, RibAlign has been developed to fill this gap. Results RibAlign serves two purposes: First, it provides a fast and scalable database that has been specifically adapted to eubacterial ribosomal protein sequences and second, it provides sophisticated import and export capabilities. This includes semi-automatic extraction of ribosomal protein sequences from whole-genome GenBank and FASTA files as well as exporting aligned, concatenated and filtered sequence files that can directly be used in conjunction with the PHYLIP and MrBayes phylogenetic reconstruction programs. Conclusion Up to now, phylogeny based on concatenated ribosomal protein sequences is hampered by the limited set of sequenced genomes and high computational requirements. However, hundreds of full and draft genome sequencing projects are on the way, and advances in cluster-computing and algorithms make phylogenetic reconstructions feasible even with large alignments of concatenated marker genes. Rib

  2. Children's perception of their synthetically corrected speech production.

    PubMed

    Strömbergsson, Sofia; Wengelin, Asa; House, David

    2014-06-01

    We explore children's perception of their own speech - in its online form, in its recorded form, and in synthetically modified forms. Children with phonological disorder (PD) and children with typical speech and language development (TD) performed tasks of evaluating accuracy of the different types of speech stimuli, either immediately after having produced the utterance or after a delay. In addition, they performed a task designed to assess their ability to detect synthetic modification. Both groups showed high performance in tasks involving evaluation of other children's speech, whereas in tasks of evaluating one's own speech, the children with PD were less accurate than their TD peers. The children with PD were less sensitive to misproductions in immediate conjunction with their production of an utterance, and more accurate after a delay. Within-category modification often passed undetected, indicating a satisfactory quality of the generated speech. Potential clinical benefits of using corrective re-synthesis are discussed.

  3. The digits-in-noise test: assessing auditory speech recognition abilities in noise.

    PubMed

    Smits, Cas; Theo Goverts, S; Festen, Joost M

    2013-03-01

    A speech-in-noise test which uses digit triplets in steady-state speech noise was developed. The test measures primarily the auditory, or bottom-up, speech recognition abilities in noise. Digit triplets were formed by concatenating single digits spoken by a male speaker. Level corrections were made to individual digits to create a set of homogeneous digit triplets with steep speech recognition functions. The test measures the speech reception threshold (SRT) in long-term average speech-spectrum noise via a 1-up, 1-down adaptive procedure with a measurement error of 0.7 dB. One training list is needed for naive listeners. No further learning effects were observed in 24 subsequent SRT measurements. The test was validated by comparing results on the test with results on the standard sentences-in-noise test. To avoid the confounding of hearing loss, age, and linguistic skills, these measurements were performed in normal-hearing subjects with simulated hearing loss. The signals were spectrally smeared and/or low-pass filtered at varying cutoff frequencies. After correction for measurement error the correlation coefficient between SRTs measured with both tests equaled 0.96. Finally, the feasibility of the test was approved in a study where reference SRT values were gathered in a representative set of 1386 listeners over 60 years of age.

  4. Expanding the occupational health methodology: A concatenated artificial neural network approach to model the burnout process in Chinese nurses.

    PubMed

    Ladstätter, Felix; Garrosa, Eva; Moreno-Jiménez, Bernardo; Ponsoda, Vicente; Reales Aviles, José Manuel; Dai, Junming

    2016-01-01

    Artificial neural networks are sophisticated modelling and prediction tools capable of extracting complex, non-linear relationships between predictor (input) and predicted (output) variables. This study explores this capacity by modelling non-linearities in the hardiness-modulated burnout process with a neural network. Specifically, two multi-layer feed-forward artificial neural networks are concatenated in an attempt to model the composite non-linear burnout process. Sensitivity analysis, a Monte Carlo-based global simulation technique, is then utilised to examine the first-order effects of the predictor variables on the burnout sub-dimensions and consequences. Results show that (1) this concatenated artificial neural network approach is feasible to model the burnout process, (2) sensitivity analysis is a prolific method to study the relative importance of predictor variables and (3) the relationships among variables involved in the development of burnout and its consequences are to different degrees non-linear. Many relationships among variables (e.g., stressors and strains) are not linear, yet researchers use linear methods such as Pearson correlation or linear regression to analyse these relationships. Artificial neural network analysis is an innovative method to analyse non-linear relationships and in combination with sensitivity analysis superior to linear methods.

  5. An experimental study of the concatenated Reed-Solomon/Viterbi channel coding system performance and its impact on space communications

    NASA Technical Reports Server (NTRS)

    Liu, K. Y.; Lee, J. J.

    1981-01-01

    The need for efficient space communication at very low bit error probabilities to the specification and implementation of a concatenated coding system using an interleaved Reed-Solomon code as the outer code and a Viterbi-decoded convolutional code as the inner code. Experimental results of this channel coding system are presented under an emulated S-band uplink and X-band downlink two-way space communication channel, where both uplink and downlink have strong carrier power. This work was performed under the NASA End-to-End Data Systems program at JPL. Test results verify that at a bit error probability of 10 to the -6 power or less, this concatenated coding system does provide a coding gain of 2.5 dB or more over the Viterbi-decoded convolutional-only coding system. These tests also show that a desirable interleaving depth for the Reed-Solomon outer code is 8 or more. The impact of this "virtually" error-free space communication link on the transmission of images is discussed and examples of simulation results are given.

  6. Status Report on Speech Research: A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications, April 1-June 30, 1977.

    ERIC Educational Resources Information Center

    Haskins Labs., New Haven, CT.

    This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. The ten papers treat the following topics: speech synthesis as a tool for the study of speech production; the study of articulatory organization; phonetic perception; cardiac…

  7. Speech imagery recalibrates speech-perception boundaries.

    PubMed

    Scott, Mark

    2016-07-01

    The perceptual boundaries between speech sounds are malleable and can shift after repeated exposure to contextual information. This shift is known as recalibration. To date, the known inducers of recalibration are lexical (including phonotactic) information, lip-read information and reading. The experiments reported here are a proof-of-effect demonstration that speech imagery can also induce recalibration.

  8. Tone recognition in continuous Cantonese speech using supratone models.

    PubMed

    Qian, Yao; Lee, Tan; Soong, Frank K

    2007-05-01

    This paper studies automatic tone recognition in continuous Cantonese speech. Cantonese is a major Chinese dialect that is known for being rich in tones. Tone information serves as a useful knowledge source for automatic speech recognition of Cantonese. Cantonese tone recognition is difficult because the tones have similar shapes of pitch contours. The tones are differentiated mainly by their relative pitch heights. In natural speech, the pitch level of a tone may shift up and down and the F0 ranges of different tones overlap with each other, making them acoustically indistinguishable within the domain of a syllable. Our study shows that the relative pitch heights are largely preserved between neighboring tones. A novel method of supratone modeling is proposed for Cantonese tone recognition. Each supratone model characterizes the F0 contour of two or three tones in succession. The tone sequence of a continuous utterance is formed as an overlapped concatenation of supratone units. The most likely tone sequence is determined under phonological constraints on syllable-tone combinations. The proposed method attains an accuracy of 74.68% in speaker-independent tone recognition experiments. In particular, the confusion among the tones with similar contour shapes is greatly resolved.

  9. Free Speech Yearbook 1978.

    ERIC Educational Resources Information Center

    Phifer, Gregg, Ed.

    The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…

  10. Talking Speech Input.

    ERIC Educational Resources Information Center

    Berliss-Vincent, Jane; Whitford, Gigi

    2002-01-01

    This article presents both the factors involved in successful speech input use and the potential barriers that may suggest that other access technologies could be more appropriate for a given individual. Speech input options that are available are reviewed and strategies for optimizing use of speech recognition technology are discussed. (Contains…

  11. Speech and respiration.

    PubMed

    Conrad, B; Schönle, P

    1979-04-12

    This investigation deals with the temporal aspects of air volume changes during speech. Speech respiration differs fundamentally from resting respiration. In resting respiration the duration and velocity of inspiration (air flow or lung volume change) are in a range similar to that of expiration. In speech respiration the duration of inspiration decreases and its velocity increases; conversely, the duration of expiration increases and the volume of air flow decreases dramatically. The following questions arise: are these two respiration types different entities, or do they represent the end points of a continuum from resting to speech respiration? How does articulation without the generation of speech sound affect breathing? Does (verbalized?) thinking without articulation or speech modify the breathing pattern? The main test battery included four tasks (spontaneous speech, reading, serial speech, arithmetic) performed under three conditions (speaking aloud, articulating subvocally, quiet performance by tryping to exclusively 'think' the tasks). Respiratory movements were measured with a chest pneumograph and evaluated in comparison with a phonogram and the identified spoken text. For quiet performance the resulting respiratory time ratio (relation of duration of inspiration versus expiration) showed a gradual shift in the direction of speech respiration--the least for reading, the most for arithmetic. This change was even more apparent for the subvocal tasks. It is concluded that (a) there is a gradual automatic change from resting to speech respiration and (b) the degree of internal verbalization (activation of motor speech areas) defines the degree of activation of the speech respiratory pattern.

  12. SPEECH HANDICAPPED SCHOOL CHILDREN.

    ERIC Educational Resources Information Center

    JOHNSON, WENDELL; AND OTHERS

    THIS BOOK IS DESIGNED PRIMARILY FOR STUDENTS WHO ARE BEING TRAINED TO WORK WITH SPEECH HANDICAPPED SCHOOL CHILDREN, EITHER AS SPEECH CORRECTIONISTS OR AS CLASSROOM TEACHERS. THE BOOK DEALS WITH FOUR MAJOR QUESTIONS--(1) WHAT KINDS OF SPEECH DISORDERS ARE FOUND AMONG SCHOOL CHILDREN, (2) WHAT ARE THE PHYSICAL, PSYCHOLOGICAL AND SOCIAL CONDITIONS,…

  13. Free Speech Yearbook 1978.

    ERIC Educational Resources Information Center

    Phifer, Gregg, Ed.

    The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…

  14. Free Speech Yearbook: 1970.

    ERIC Educational Resources Information Center

    Tedford, Thomas L., Ed.

    This book is a collection of syllabi, attitude surveys, and essays relating to free-speech issues, compiled by the Committee on Freedom of Seech of the Speech Communication Association. The collection begins with a rationale for the inclusion of a course on free speech in the college curriculum. Three syllabi with bibliographies present guides for…

  15. Speech Database Development

    DTIC Science & Technology

    1988-11-21

    August 1983. [9] Lennig , M., "Automatic Alignment of Natural Speech with a Corresponding Transcription," Speech Communication, 11th International...the cumulative distribution of the bounidary 161 M. Lennig . "Automatic Alignment of Natural Speech with a offsets between the automatic alignment

  16. Auditory cortical deactivation during speech production and following speech perception: an EEG investigation of the temporal dynamics of the auditory alpha rhythm

    PubMed Central

    Jenson, David; Harkrider, Ashley W.; Thornton, David; Bowers, Andrew L.; Saltuklaroglu, Tim

    2015-01-01

    Sensorimotor integration (SMI) across the dorsal stream enables online monitoring of speech. Jenson et al. (2014) used independent component analysis (ICA) and event related spectral perturbation (ERSP) analysis of electroencephalography (EEG) data to describe anterior sensorimotor (e.g., premotor cortex, PMC) activity during speech perception and production. The purpose of the current study was to identify and temporally map neural activity from posterior (i.e., auditory) regions of the dorsal stream in the same tasks. Perception tasks required “active” discrimination of syllable pairs (/ba/ and /da/) in quiet and noisy conditions. Production conditions required overt production of syllable pairs and nouns. ICA performed on concatenated raw 68 channel EEG data from all tasks identified bilateral “auditory” alpha (α) components in 15 of 29 participants localized to pSTG (left) and pMTG (right). ERSP analyses were performed to reveal fluctuations in the spectral power of the α rhythm clusters across time. Production conditions were characterized by significant α event related synchronization (ERS; pFDR < 0.05) concurrent with EMG activity from speech production, consistent with speech-induced auditory inhibition. Discrimination conditions were also characterized by α ERS following stimulus offset. Auditory α ERS in all conditions temporally aligned with PMC activity reported in Jenson et al. (2014). These findings are indicative of speech-induced suppression of auditory regions, possibly via efference copy. The presence of the same pattern following stimulus offset in discrimination conditions suggests that sensorimotor contributions following speech perception reflect covert replay, and that covert replay provides one source of the motor activity previously observed in some speech perception tasks. To our knowledge, this is the first time that inhibition of auditory regions by speech has been observed in real-time with the ICA/ERSP technique. PMID

  17. Auditory cortical deactivation during speech production and following speech perception: an EEG investigation of the temporal dynamics of the auditory alpha rhythm.

    PubMed

    Jenson, David; Harkrider, Ashley W; Thornton, David; Bowers, Andrew L; Saltuklaroglu, Tim

    2015-01-01

    Sensorimotor integration (SMI) across the dorsal stream enables online monitoring of speech. Jenson et al. (2014) used independent component analysis (ICA) and event related spectral perturbation (ERSP) analysis of electroencephalography (EEG) data to describe anterior sensorimotor (e.g., premotor cortex, PMC) activity during speech perception and production. The purpose of the current study was to identify and temporally map neural activity from posterior (i.e., auditory) regions of the dorsal stream in the same tasks. Perception tasks required "active" discrimination of syllable pairs (/ba/ and /da/) in quiet and noisy conditions. Production conditions required overt production of syllable pairs and nouns. ICA performed on concatenated raw 68 channel EEG data from all tasks identified bilateral "auditory" alpha (α) components in 15 of 29 participants localized to pSTG (left) and pMTG (right). ERSP analyses were performed to reveal fluctuations in the spectral power of the α rhythm clusters across time. Production conditions were characterized by significant α event related synchronization (ERS; pFDR < 0.05) concurrent with EMG activity from speech production, consistent with speech-induced auditory inhibition. Discrimination conditions were also characterized by α ERS following stimulus offset. Auditory α ERS in all conditions temporally aligned with PMC activity reported in Jenson et al. (2014). These findings are indicative of speech-induced suppression of auditory regions, possibly via efference copy. The presence of the same pattern following stimulus offset in discrimination conditions suggests that sensorimotor contributions following speech perception reflect covert replay, and that covert replay provides one source of the motor activity previously observed in some speech perception tasks. To our knowledge, this is the first time that inhibition of auditory regions by speech has been observed in real-time with the ICA/ERSP technique.

  18. A Statistical Quality Model for Data-Driven Speech Animation.

    PubMed

    Ma, Xiaohan; Deng, Zhigang

    2012-11-01

    In recent years, data-driven speech animation approaches have achieved significant successes in terms of animation quality. However, how to automatically evaluate the realism of novel synthesized speech animations has been an important yet unsolved research problem. In this paper, we propose a novel statistical model (called SAQP) to automatically predict the quality of on-the-fly synthesized speech animations by various data-driven techniques. Its essential idea is to construct a phoneme-based, Speech Animation Trajectory Fitting (SATF) metric to describe speech animation synthesis errors and then build a statistical regression model to learn the association between the obtained SATF metric and the objective speech animation synthesis quality. Through delicately designed user studies, we evaluate the effectiveness and robustness of the proposed SAQP model. To the best of our knowledge, this work is the first-of-its-kind, quantitative quality model for data-driven speech animation. We believe it is the important first step to remove a critical technical barrier for applying data-driven speech animation techniques to numerous online or interactive talking avatar applications.

  19. Speech in spinocerebellar ataxia.

    PubMed

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.

  20. Relations Among Central Auditory Abilities, Socio-Economic Factors, Speech Delay, Phonic Abilities and Reading Achievement: A Longitudinal Study.

    ERIC Educational Resources Information Center

    Flowers, Arthur; Crandell, Edwin W.

    Three auditory perceptual processes (resistance to distortion, selective listening in the form of auditory dedifferentiation, and binaural synthesis) were evaluated by five assessment techniques: (1) low pass filtered speech, (2) accelerated speech, (3) competing messages, (4) accelerated plus competing messages, and (5) binaural synthesis.…

  1. Annealed lattice animal model and Flory theory for the melt of non-concatenated rings: towards the physics of crumpling.

    PubMed

    Grosberg, Alexander Y

    2014-01-28

    A Flory theory is constructed for a long polymer ring in a melt of unknotted and non-concatenated rings. The theory assumes that the ring forms an effective annealed branched object and computes its primitive path. It is shown that the primitive path follows self-avoiding statistics and is characterized by the corresponding Flory exponent of a polymer with excluded volume. Based on that, it is shown that rings in the melt are compact objects with overall size proportional to their length raised to the 1/3 power. Furthermore, the contact probability exponent γcontact is estimated, albeit by a poorly controlled approximation, with the result close to 1.1 consistent with both numerical and experimental data.

  2. Performance of Single and Concatenated Sets of Mitochondrial Genes at Inferring Metazoan Relationships Relative to Full Mitogenome Data

    PubMed Central

    Havird, Justin C.; Santos, Scott R.

    2014-01-01

    Mitochondrial (mt) genes are some of the most popular and widely-utilized genetic loci in phylogenetic studies of metazoan taxa. However, their linked nature has raised questions on whether using the entire mitogenome for phylogenetics is overkill (at best) or pseudoreplication (at worst). Moreover, no studies have addressed the comparative phylogenetic utility of mitochondrial genes across individual lineages within the entire Metazoa. To comment on the phylogenetic utility of individual mt genes as well as concatenated subsets of genes, we analyzed mitogenomic data from 1865 metazoan taxa in 372 separate lineages spanning genera to subphyla. Specifically, phylogenies inferred from these datasets were statistically compared to ones generated from all 13 mt protein-coding (PC) genes (i.e., the “supergene” set) to determine which single genes performed “best” at, and the minimum number of genes required to, recover the “supergene” topology. Surprisingly, the popular marker COX1 performed poorest, while ND5, ND4, and ND2 were most likely to reproduce the “supergene” topology. Averaged across all lineages, the longest ∼2 mt PC genes were sufficient to recreate the “supergene” topology, although this average increased to ∼5 genes for datasets with 40 or more taxa. Furthermore, concatenation of the three “best” performing mt PC genes outperformed that of the three longest mt PC genes (i.e, ND5, COX1, and ND4). Taken together, while not all mt PC genes are equally interchangeable in phylogenetic studies of the metazoans, some subset can serve as a proxy for the 13 mt PC genes. However, the exact number and identity of these genes is specific to the lineage in question and cannot be applied indiscriminately across the Metazoa. PMID:24454717

  3. A procedure for estimating gestural scores from speech acoustics

    PubMed Central

    Nam, Hosung; Mitra, Vikramjit; Tiede, Mark; Hasegawa-Johnson, Mark; Espy-Wilson, Carol; Saltzman, Elliot; Goldstein, Louis

    2012-01-01

    Speech can be represented as a constellation of constricting vocal tract actions called gestures, whose temporal patterning with respect to one another is expressed in a gestural score. Current speech datasets do not come with gestural annotation and no formal gestural annotation procedure exists at present. This paper describes an iterative analysis-by-synthesis landmark-based time-warping architecture to perform gestural annotation of natural speech. For a given utterance, the Haskins Laboratories Task Dynamics and Application (TADA) model is employed to generate a corresponding prototype gestural score. The gestural score is temporally optimized through an iterative timing-warping process such that the acoustic distance between the original and TADA-synthesized speech is minimized. This paper demonstrates that the proposed iterative approach is superior to conventional acoustically-referenced dynamic timing-warping procedures and provides reliable gestural annotation for speech datasets. PMID:23231127

  4. A procedure for estimating gestural scores from speech acoustics.

    PubMed

    Nam, Hosung; Mitra, Vikramjit; Tiede, Mark; Hasegawa-Johnson, Mark; Espy-Wilson, Carol; Saltzman, Elliot; Goldstein, Louis

    2012-12-01

    Speech can be represented as a constellation of constricting vocal tract actions called gestures, whose temporal patterning with respect to one another is expressed in a gestural score. Current speech datasets do not come with gestural annotation and no formal gestural annotation procedure exists at present. This paper describes an iterative analysis-by-synthesis landmark-based time-warping architecture to perform gestural annotation of natural speech. For a given utterance, the Haskins Laboratories Task Dynamics and Application (TADA) model is employed to generate a corresponding prototype gestural score. The gestural score is temporally optimized through an iterative timing-warping process such that the acoustic distance between the original and TADA-synthesized speech is minimized. This paper demonstrates that the proposed iterative approach is superior to conventional acoustically-referenced dynamic timing-warping procedures and provides reliable gestural annotation for speech datasets.

  5. Military applications of automatic speech recognition and future requirements

    NASA Technical Reports Server (NTRS)

    Beek, Bruno; Cupples, Edward J.

    1977-01-01

    An updated summary of the state-of-the-art of automatic speech recognition and its relevance to military applications is provided. A number of potential systems for military applications are under development. These include: (1) digital narrowband communication systems; (2) automatic speech verification; (3) on-line cartographic processing unit; (4) word recognition for militarized tactical data system; and (5) voice recognition and synthesis for aircraft cockpit.

  6. Neural network based speech synthesizer: A preliminary report

    NASA Technical Reports Server (NTRS)

    Villarreal, James A.; Mcintire, Gary

    1987-01-01

    A neural net based speech synthesis project is discussed. The novelty is that the reproduced speech was extracted from actual voice recordings. In essence, the neural network learns the timing, pitch fluctuations, connectivity between individual sounds, and speaking habits unique to that individual person. The parallel distributed processing network used for this project is the generalized backward propagation network which has been modified to also learn sequences of actions or states given in a particular plan.

  7. Speech-to-Speech Relay Service

    MedlinePlus

    ... to make an STS call. You are then connected to an STS CA who will repeat your spoken words, making the spoken words clear to the other party. Persons with speech disabilities may also receive STS calls. The calling ...

  8. Speech Recognition: A General Overview.

    ERIC Educational Resources Information Center

    de Sopena, Luis

    Speech recognition is one of five main areas in the field of speech processing. Difficulties in speech recognition include variability in sound within and across speakers, in channel, in background noise, and of speech production. Speech recognition can be used in a variety of situations: to perform query operations and phone call transfers; for…

  9. Random Addition Concatenation Analysis: a novel approach to the exploration of phylogenomic signal reveals strong agreement between core and shell genomic partitions in the cyanobacteria.

    PubMed

    Narechania, Apurva; Baker, Richard H; Sit, Ryan; Kolokotronis, Sergios-Orestis; DeSalle, Rob; Planet, Paul J

    2012-01-01

    Recent whole-genome approaches to microbial phylogeny have emphasized partitioning genes into functional classes, often focusing on differences between a stable core of genes and a variable shell. To rigorously address the effects of partitioning and combining genes in genome-level analyses, we developed a novel technique called Random Addition Concatenation Analysis (RADICAL). RADICAL operates by sequentially concatenating randomly chosen gene partitions starting with a single-gene partition and ending with the entire genomic data set. A phylogenetic tree is built for every successive addition, and the entire process is repeated creating multiple random concatenation paths. The result is a library of trees representing a large variety of differently sized random gene partitions. This library can then be mined to identify unique topologies, assess overall agreement, and measure support for different trees. To evaluate RADICAL, we used 682 orthologous genes across 13 cyanobacterial genomes. Despite previous assertions of substantial differences between a core and a shell set of genes for this data set, RADICAL reveals the two partitions contain congruent phylogenetic signal. Substantial disagreement within the data set is limited to a few nodes and genes involved in metabolism, a functional group that is distributed evenly between the core and the shell partitions. We highlight numerous examples where RADICAL reveals aspects of phylogenetic behavior not evident by examining individual gene trees or a "'total evidence" tree. Our method also demonstrates that most emergent phylogenetic signal appears early in the concatenation process. The software is freely available at http://desalle.amnh.org.

  10. Early recognition of speech

    PubMed Central

    Remez, Robert E; Thomas, Emily F

    2013-01-01

    Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454

  11. Frequency Domain Speech Coding

    DTIC Science & Technology

    1991-12-01

    perceptible affect on the sound of the reconstructed noiselike speech . It is possible that the frequency bands need not be mel scaled. Equally spaced frequency...levels seemed to affect the quality of the reproduced speech more than did the number of amplitude quantization levels. Informal listening test...the original. Eliminating spectral components has an adverse affect on the quality of reproduced speech . The whole process of selecting frequency and

  12. Speech Understanding Systems

    DTIC Science & Technology

    1975-11-01

    AD-A018 683 SPEECH UNDERSTANDING SYSTEMS William A. Woods, et al Bolt Beranek and Newman, Incorporated Prepared for: Office of Naval Research...E N T t E S E A I C H r II- BBN Report No. 3188 A.I. Report No. 39 CO 00 00 s SPEECH UNDERSTANDING SYSTEMS Annual...1. OOVT ACCCUKW NO. «. TITLt fan« S.*((i(.) SPEECH UNDERSTANDING SYSTEMS Annual Technical Progress Report 30 October 1974 to 29 October 1975

  13. Packet Speech Systems Technology

    DTIC Science & Technology

    1982-03-31

    ring the phone or turn on the vocoder. Incoming speech is echoed back to the sender. This allows one site to conduct cross-country tests or demos...the first packet to arrive. Time resolution (histogram cell size) was 22.5 ms, the parcel time in the PCM speech encoder in the PVT. A traffic...additional commands as the need may arise without adversely affecting the foreground functions of sorting, routing, and dispatching the speech traffic. An

  14. Speech Alarms Pilot Study

    NASA Technical Reports Server (NTRS)

    Sandor, Aniko; Moses, Haifa

    2016-01-01

    Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.

  15. Tracking speech sound acquisition.

    PubMed

    Powell, Thomas W

    2011-11-01

    This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. Journal of Speech and Hearing Research, 33, 28-37), uses a railway idiom to track gains in the complexity of speech sound production. A clinical case study is reviewed to illustrate application of the procedure. The procedure is intended to facilitate application of an evidence-based procedure to the clinical management of developmental speech sound disorders.

  16. Distributed processing for speech understanding

    SciTech Connect

    Bronson, E.C.; Siegel, L.

    1983-01-01

    Continuous speech understanding is a highly complex artificial intelligence task requiring extensive computation. This complexity precludes real-time speech understanding on a conventional serial computer. Distributed processing technique can be applied to the speech understanding task to improve processing speed. In the paper, the speech understanding task and several speech understanding systems are described. Parallel processing techniques are presented and a distributed processing architecture for speech understanding is outlined. 35 references.

  17. The development of gesture and speech as an integrated system.

    PubMed

    Goldin-Meadow, S

    1998-01-01

    transpose that knowledge to a new level and thus express those pieces of information within a single modality? More work is needed to investigate whether the act of producing gesture-speech mismatches itself facilitates transition. Even if it turns out that the production of gesture-speech mismatches has little role to play in facilitating cognitive change, mismatch remains a reliable marker of the speaker's potential for cognitive growth. As such, an understanding of the relationship between gesture and speech may prove useful in clinical settings. For example, there is some evidence that children with delayed onset of two-word speech fall naturally into two groups: children who eventually achieve two-word speech, albeit later than the norm (that is, late bloomers), and children who continue to have serious difficulties with spoken language and may never be able to combine words into a single string (Feldman, Holland, Kemp, and Janosky, 1992; Thal, Tobias, and Morrison, 1991). Observation of combinations in which gesture and speech convey different information may prove a useful clinical tool for distinguishing, at a relatively young age, children who will be late bloomers from those who will have great difficulty mastering spoken language without intervention (see Stare, 1996, for preliminary evidence that the relationship between gesture and speech in children with unilateral brain damage correlates with early versus late onset of two-word combinations. In sum, for both speakers and listeners, gesture and speech are two aspects of a single process, with each modality contributing its own unique level of representation. Gesture conveys information in the global, imagistic form for which it is well suited, and speech conveys information in the segmented, combinatorial fashion that characterizes linguistic structures. The total representation of any message is therefore a synthesis of the analog gestural mode and the discrete speech mode. (ABSTRACT TRUNCATED)

  18. Cochlear implant speech recognition with speech maskers

    NASA Astrophysics Data System (ADS)

    Stickney, Ginger S.; Zeng, Fan-Gang; Litovsky, Ruth; Assmann, Peter

    2004-08-01

    Speech recognition performance was measured in normal-hearing and cochlear-implant listeners with maskers consisting of either steady-state speech-spectrum-shaped noise or a competing sentence. Target sentences from a male talker were presented in the presence of one of three competing talkers (same male, different male, or female) or speech-spectrum-shaped noise generated from this talker at several target-to-masker ratios. For the normal-hearing listeners, target-masker combinations were processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli, a normal-hearing control group maintained high levels of intelligibility down to target-to-masker ratios as low as 0 dB and showed a release from masking, producing better performance with single-talker maskers than with steady-state noise. In contrast, no masking release was observed in either implant or normal-hearing subjects listening through an implant simulation. The performance of the simulation and implant groups did not improve when the single-talker masker was a different talker compared to the same talker as the target speech, as was found in the normal-hearing control. These results are interpreted as evidence for a significant role of informational masking and modulation interference in cochlear implant speech recognition with fluctuating maskers. This informational masking may originate from increased target-masker similarity when spectral resolution is reduced.

  19. Commercial applications of speech interface technology: an industry at the threshold.

    PubMed Central

    Oberteuffer, J A

    1995-01-01

    Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines. PMID:7479717

  20. Commercial applications of speech interface technology: an industry at the threshold.

    PubMed

    Oberteuffer, J A

    1995-10-24

    Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.

  1. Research on Speech Perception. Progress Report No. 8, January 1982-December 1982.

    ERIC Educational Resources Information Center

    Pisoni, David B.; And Others

    Summarizing research activities from January 1982 to December 1982, this is the eighth annual report of research on speech perception, analysis and synthesis conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, progress reports, and information…

  2. Research on Speech Perception. Progress Report No. 9, January 1983-December 1983.

    ERIC Educational Resources Information Center

    Pisoni, David B.; And Others

    Summarizing research activities from January 1983 to December 1983, this is the ninth annual report of research on speech perception, analysis and synthesis conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, and progress reports. The report…

  3. Continuous speech segmentation determined by blind source separation

    NASA Astrophysics Data System (ADS)

    Szu, Harold H.; Hsu, Charles C.; Xie, Da-Hong

    1998-03-01

    One of the problems of 5 percent error rate encountered in continuous speech recognition is partly due to the difficulty in the identification of a mixed up to two phonemes in a close concatenation. For instance, one speaks of 'Let's go' instead of 'Let us go'. There are two kinds of speech segmentations: the linguistic segmentation and the acoustic segmentation. The linguistic segmentation relies on a combination of acoustic, lexical, semantic, and statistical knowledge sources, which has been studied. Daily spoken conversations are usually abbreviated for speakers' convenience. The acoustic segmentation is to separate the mixed sounds such as /ts/ into /t/ and /s/ for automatically finding linguistic units. Adaptive wavelet transform (AWT) developed by Szu is a linear superposition of banks of constant-Q zero-mean mother wavelets implemented by an ANN called a 'wavenet'. Each neuron is represented by a daughter wavelet, which can be an affine scale change of identical or different method wavelet for a continuous AWT. AWT was designed for the cocktail party effect and to solve the acoustic segmentation of phonemes using a supervised learning ANN architecture. In this paper, we reviewed AWT from Independent Component Analysis viewpoint, and then applied blind source separation to the acoustic de-mixing and segmentation.

  4. Tracking Speech Sound Acquisition

    ERIC Educational Resources Information Center

    Powell, Thomas W.

    2011-01-01

    This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. "Journal of Speech and Hearing Research", 33, 28-37), uses a railway idiom to track gains in…

  5. Free Speech. No. 38.

    ERIC Educational Resources Information Center

    Kane, Peter E., Ed.

    This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds…

  6. Free Speech Yearbook 1976.

    ERIC Educational Resources Information Center

    Phifer, Gregg, Ed.

    The articles collected in this annual address several aspects of First Amendment Law. The following titles are included: "Freedom of Speech As an Academic Discipline" (Franklyn S. Haiman), "Free Speech and Foreign-Policy Decision Making" (Douglas N. Freeman), "The Supreme Court and the First Amendment: 1975-1976"…

  7. Primary Progressive Speech Abulia.

    PubMed

    Milano, Nicholas J; Heilman, Kenneth M

    2015-01-01

    Primary progressive aphasia (PPA) is a neurodegenerative disorder characterized by progressive language impairment. The three variants of PPA include the nonfluent/agrammatic, semantic, and logopenic types. The goal of this report is to describe two patients with a loss of speech initiation that was associated with bilateral medial frontal atrophy. Two patients with progressive speech deficits were evaluated and their examinations revealed a paucity of spontaneous speech; however their naming, repetition, reading, and writing were all normal. The patients had no evidence of agrammatism or apraxia of speech but did have impaired speech fluency. In addition to impaired production of propositional spontaneous speech, these patients had impaired production of automatic speech (e.g., reciting the Lord's Prayer) and singing. Structural brain imaging revealed bilateral medial frontal atrophy in both patients. These patients' language deficits are consistent with a PPA, but they are in the pattern of a dynamic aphasia. Whereas the signs-symptoms of dynamic aphasia have been previously described, to our knowledge these are the first cases associated with predominantly bilateral medial frontal atrophy that impaired both propositional and automatic speech. Thus, this profile may represent a new variant of PPA.

  8. Preschool Connected Speech Inventory.

    ERIC Educational Resources Information Center

    DiJohnson, Albert; And Others

    This speech inventory developed for a study of aurally handicapped preschool children (see TM 001 129) provides information on intonation patterns in connected speech. The inventory consists of a list of phrases and simple sentences accompanied by pictorial clues. The test is individually administered by a teacher-examiner who presents the spoken…

  9. Illustrated Speech Anatomy.

    ERIC Educational Resources Information Center

    Shearer, William M.

    Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…

  10. Chief Seattle's Speech Revisited

    ERIC Educational Resources Information Center

    Krupat, Arnold

    2011-01-01

    Indian orators have been saying good-bye for more than three hundred years. John Eliot's "Dying Speeches of Several Indians" (1685), as David Murray notes, inaugurates a long textual history in which "Indians... are most useful dying," or, as in a number of speeches, bidding the world farewell as they embrace an undesired but…

  11. Chief Seattle's Speech Revisited

    ERIC Educational Resources Information Center

    Krupat, Arnold

    2011-01-01

    Indian orators have been saying good-bye for more than three hundred years. John Eliot's "Dying Speeches of Several Indians" (1685), as David Murray notes, inaugurates a long textual history in which "Indians... are most useful dying," or, as in a number of speeches, bidding the world farewell as they embrace an undesired but…

  12. Tracking Speech Sound Acquisition

    ERIC Educational Resources Information Center

    Powell, Thomas W.

    2011-01-01

    This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. "Journal of Speech and Hearing Research", 33, 28-37), uses a railway idiom to track gains in…

  13. Improving Alaryngeal Speech Intelligibility.

    ERIC Educational Resources Information Center

    Christensen, John M.; Dwyer, Patricia E.

    1990-01-01

    Laryngectomized patients using esophageal speech or an electronic artificial larynx have difficulty producing correct voicing contrasts between homorganic consonants. This paper describes a therapy technique that emphasizes "pushing harder" on voiceless consonants to improve alaryngeal speech intelligibility and proposes focusing on the…

  14. Illustrated Speech Anatomy.

    ERIC Educational Resources Information Center

    Shearer, William M.

    Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…

  15. Private Speech in Ballet

    ERIC Educational Resources Information Center

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  16. Advertising and Free Speech.

    ERIC Educational Resources Information Center

    Hyman, Allen, Ed.; Johnson, M. Bruce, Ed.

    The articles collected in this book originated at a conference at which legal and economic scholars discussed the issue of First Amendment protection for commercial speech. The first article, in arguing for freedom for commercial speech, finds inconsistent and untenable the arguments of those who advocate freedom from regulation for political…

  17. Egocentric Speech Reconsidered.

    ERIC Educational Resources Information Center

    Braunwald, Susan R.

    A range of language use model is proposed as an alternative conceptual framework to a stage model of egocentric speech. The range of language use model is proposed to clarify the meaning of the term egocentric speech, to examine the validity of stage assumptions, and to explain the existence of contextual variation in the form of children's…

  18. Free Speech Yearbook 1981.

    ERIC Educational Resources Information Center

    Kane, Peter E., Ed.

    1982-01-01

    The nine articles in this collection deal with theoretical and practical freedom of speech issues. Topics discussed include the following: (1) freedom of expression in Thailand and India; (2) metaphors and analogues in several landmark free speech cases; (3) Supreme Court Justice William O. Douglas's views of the First Amendment; (4) the San…

  19. Free Speech Yearbook 1975.

    ERIC Educational Resources Information Center

    Barbour, Alton, Ed.

    This issue of the "Free Speech Yearbook" contains the following: "Between Rhetoric and Disloyalty: Free Speech Standards for the Sunshire Soldier" by Richard A. Parker; "William A. Rehnquist: Ideologist on the Bench" by Peter E. Kane; "The First Amendment's Weakest Link: Government Regulation of Controversial…

  20. Private Speech in Ballet

    ERIC Educational Resources Information Center

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  1. Free Speech Yearbook 1977.

    ERIC Educational Resources Information Center

    Phifer, Gregg, Ed.

    The eleven articles in this collection explore various aspects of freedom of speech. Topics include the lack of knowledge on the part of many judges regarding the complex act of communication; the legislatures and free speech in colonial Connecticut and Rhode Island; contributions of sixteenth century Anabaptist heretics to First Amendment…

  2. Free Speech Yearbook 1981.

    ERIC Educational Resources Information Center

    Kane, Peter E., Ed.

    1982-01-01

    The nine articles in this collection deal with theoretical and practical freedom of speech issues. Topics discussed include the following: (1) freedom of expression in Thailand and India; (2) metaphors and analogues in several landmark free speech cases; (3) Supreme Court Justice William O. Douglas's views of the First Amendment; (4) the San…

  3. SPEECH COMMUNICATION RESEARCH.

    DTIC Science & Technology

    studies of the dynamics of speech production through cineradiographic techniques and through acoustic analysis of formant motions in vowels in various...particular, the activity of the vocal cords and the dynamics of tongue motion. Research on speech perception has included experiments on vowel

  4. Improving Alaryngeal Speech Intelligibility.

    ERIC Educational Resources Information Center

    Christensen, John M.; Dwyer, Patricia E.

    1990-01-01

    Laryngectomized patients using esophageal speech or an electronic artificial larynx have difficulty producing correct voicing contrasts between homorganic consonants. This paper describes a therapy technique that emphasizes "pushing harder" on voiceless consonants to improve alaryngeal speech intelligibility and proposes focusing on the…

  5. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  6. Decomposition in a non-concatenated morphological structure involves more than just the roots: Evidence from fast priming.

    PubMed

    Deutsch, Avital; Velan, Hadas; Michaly, Tamar

    2016-11-11

    Complex words in Hebrew are composed of two non-concatenated morphemes: a consonantal root embedded in a nominal or verbal word-pattern morpho-phonological unit made up of vowels or vowels and consonants. Research on written-word recognition has revealed a robust effect of the roots and the verbal-patterns, but not of the nominal-patterns, on word recognition. These findings suggest that the Hebrew lexicon is organized and accessed via roots. We explored the hypothesis that the absence of a nominal-pattern effect reflects methodological limitations of the experimental paradigms used in previous studies. Specifically, the potential facilitative effect induced by a shared nominal-pattern was counteracted by an interference effect induced by the competition between the roots of two words derived from different roots but with the same nominal-pattern. In the current study, a fast-priming paradigm for sentence reading and a "delayed-letters" procedure were used to isolate the initial effect of nominal-patterns on lexical access. The results, based on eye-fixation latency, demonstrated a facilitatory effect induced by nominal-pattern primes relative to orthographic control primes when presented for 33 or 42 ms. The results are discussed in relation to the role of the word-pattern as an organizing principle of the Hebrew lexicon, together with the roots.

  7. CRE: a cost effective and rapid approach for PCR-mediated concatenation of KRAS and EGFR exons

    PubMed Central

    Ramteke, Manoj P.; Patel, Kuldeep J; Godbole, Mukul; Vyas, Maulik; Karve, Kunal; Choughule, Anuradha; Prabhash, Kumar; Dutt, Amit

    2016-01-01

    Molecular diagnostics has changed the way lung cancer patients are treated worldwide. Of several different testing methods available, PCR followed by directed sequencing and amplification refractory mutation system (ARMS) are the two most commonly used diagnostic methods worldwide to detect mutations at  KRAS exon 2 and  EGFR kinase domain exons 18-21 in lung cancer. Compared to ARMS, the PCR followed by directed sequencing approach is relatively inexpensive but more cumbersome to perform. Moreover, with a limiting amount of genomic DNA from clinical formalin-fixed, paraffin-embedded (FFPE) specimens or fine biopsies of lung tumors, multiple rounds of PCR and sequencing reactions often get challenging. Here, we report a cost-effective single multiplex-PCR based method, CRE (for  Co-amplification of five  K RAS and  E GFR exons), followed by concatenation of the PCR product as a single linear fragment for direct sequencing. CRE is a robust protocol that can be adapted for routine use in clinical diagnostics with reduced variability, cost and turnaround time requiring a minimal amount of template DNA extracted from FFPE or fresh frozen tumor samples. As a proof of principle, CRE is able to detect the activating  EGFR L858R and T790M  EGFR mutations in lung cancer cell line and primary tumors. PMID:27127615

  8. Cortical Oscillations in Auditory Perception and Speech: Evidence for Two Temporal Windows in Human Auditory Cortex

    PubMed Central

    Luo, Huan; Poeppel, David

    2012-01-01

    Natural sounds, including vocal communication sounds, contain critical information at multiple time scales. Two essential temporal modulation rates in speech have been argued to be in the low gamma band (∼20–80 ms duration information) and the theta band (∼150–300 ms), corresponding to segmental and diphonic versus syllabic modulation rates, respectively. It has been hypothesized that auditory cortex implements temporal integration using time constants closely related to these values. The neural correlates of a proposed dual temporal window mechanism in human auditory cortex remain poorly understood. We recorded MEG responses from participants listening to non-speech auditory stimuli with different temporal structures, created by concatenating frequency-modulated segments of varied segment durations. We show that such non-speech stimuli with temporal structure matching speech-relevant scales (∼25 and ∼200 ms) elicit reliable phase tracking in the corresponding associated oscillatory frequencies (low gamma and theta bands). In contrast, stimuli with non-matching temporal structure do not. Furthermore, the topography of theta band phase tracking shows rightward lateralization while gamma band phase tracking occurs bilaterally. The results support the hypothesis that there exists multi-time resolution processing in cortex on discontinuous scales and provide evidence for an asymmetric organization of temporal analysis (asymmetrical sampling in time, AST). The data argue for a mesoscopic-level neural mechanism underlying multi-time resolution processing: the sliding and resetting of intrinsic temporal windows on privileged time scales. PMID:22666214

  9. [Speech audiometry, speech perception and cognitive functions. German version].

    PubMed

    Meister, H

    2017-03-01

    Examination of cognitive functions in the framework of speech perception has recently gained increasing scientific and clinical interest. Especially against the background of age-related hearing impairment and cognitive decline potential new perspectives in terms of better individualisation of auditory diagnosis and rehabilitation might arise. This review addresses the relationships of speech audiometry, speech perception and cognitive functions. It presents models of speech perception, discusses associations of neuropsychological with audiometric outcomes and shows recent efforts to consider cognitive functions with speech audiometry.

  10. Voice and Speech after Laryngectomy

    ERIC Educational Resources Information Center

    Stajner-Katusic, Smiljka; Horga, Damir; Musura, Maja; Globlek, Dubravka

    2006-01-01

    The aim of the investigation is to compare voice and speech quality in alaryngeal patients using esophageal speech (ESOP, eight subjects), electroacoustical speech aid (EACA, six subjects) and tracheoesophageal voice prosthesis (TEVP, three subjects). The subjects reading a short story were recorded in the sound-proof booth and the speech samples…

  11. Automatic Recognition of Deaf Speech.

    ERIC Educational Resources Information Center

    Abdelhamied, Kadry; And Others

    1990-01-01

    This paper describes a speech perception system for automatic recognition of deaf speech. Using a 2-step segmentation approach for 468 utterances by 2 hearing-impaired men and 2 normal-hearing men, rates as high as 93.01 percent and 81.81 percent recognition were obtained in recognizing from deaf speech isolated words and connected speech,…

  12. Speech Correction in the Schools.

    ERIC Educational Resources Information Center

    Eisenson, Jon; Ogilvie, Mardel

    An introduction to the problems and therapeutic needs of school age children whose speech requires remedial attention, the text is intended for both the classroom teacher and the speech correctionist. General considerations include classification and incidence of speech defects, speech correction services, the teacher as a speaker, the mechanism…

  13. Environmental Contamination of Normal Speech.

    ERIC Educational Resources Information Center

    Harley, Trevor A.

    1990-01-01

    Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…

  14. Voice and Speech after Laryngectomy

    ERIC Educational Resources Information Center

    Stajner-Katusic, Smiljka; Horga, Damir; Musura, Maja; Globlek, Dubravka

    2006-01-01

    The aim of the investigation is to compare voice and speech quality in alaryngeal patients using esophageal speech (ESOP, eight subjects), electroacoustical speech aid (EACA, six subjects) and tracheoesophageal voice prosthesis (TEVP, three subjects). The subjects reading a short story were recorded in the sound-proof booth and the speech samples…

  15. Sperry Univac speech communications technology

    NASA Technical Reports Server (NTRS)

    Medress, Mark F.

    1977-01-01

    Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.

  16. Portable Speech Synthesizer

    NASA Technical Reports Server (NTRS)

    Leibfritz, Gilbert H.; Larson, Howard K.

    1987-01-01

    Compact speech synthesizer useful traveling companion to speech-handicapped. User simply enters statement on board, and synthesizer converts statement into spoken words. Battery-powered and housed in briefcase, easily carried on trips. Unit used on telephones and face-to-face communication. Synthesizer consists of micro-computer with memory-expansion module, speech-synthesizer circuit, batteries, recharger, dc-to-dc converter, and telephone amplifier. Components, commercially available, fit neatly in 17-by 13-by 5-in. briefcase. Weighs about 20 lb (9 kg) and operates and recharges from ac receptable.

  17. Fluid Dynamics of Human Phonation and Speech

    NASA Astrophysics Data System (ADS)

    Mittal, Rajat; Erath, Byron D.; Plesniak, Michael W.

    2013-01-01

    This article presents a review of the fluid dynamics, flow-structure interactions, and acoustics associated with human phonation and speech. Our voice is produced through the process of phonation in the larynx, and an improved understanding of the underlying physics of this process is essential to advancing the treatment of voice disorders. Insights into the physics of phonation and speech can also contribute to improved vocal training and the development of new speech compression and synthesis schemes. This article introduces the key biomechanical features of the laryngeal physiology, reviews the basic principles of voice production, and summarizes the progress made over the past half-century in understanding the flow physics of phonation and speech. Laryngeal pathologies, which significantly enhance the complexity of phonatory dynamics, are discussed. After a thorough examination of the state of the art in computational modeling and experimental investigations of phonatory biomechanics, we present a synopsis of the pacing issues in this arena and an outlook for research in this fascinating subject.

  18. EMP Interaction: Principles, Techniques and Reference Data (A Compleat concatenation of Technology from the EMP Interaction Notes). EMP Interaction 2-1

    DTIC Science & Technology

    1980-12-01

    the United States Government purposes. This report has been reviewed by the Public Affairs Office and is releasable to the National Technical...Information Service (NTIS). At NTIS, it will be available to the yeneral public, including foreign nations. This technical report has been reviewed and is...INTERACTION: PRINCIPLES, TECHNIQUES AND REFERENCE DATA Final Report (A COMPLEAT CONCATENATION OF TECHNOLOGY FROM THE 5- PEFPURMING ORG, REPORT NUMBER

  19. Concatenation and Concordance in the Reconstruction of Mouse Lemur Phylogeny: An Empirical Demonstration of the Effect of Allele Sampling in Phylogenetics

    PubMed Central

    Weisrock, David W.; Smith, Stacey D.; Chan, Lauren M.; Biebouw, Karla; Kappeler, Peter M.; Yoder, Anne D.

    2012-01-01

    The systematics and speciation literature is rich with discussion relating to the potential for gene tree/species tree discordance. Numerous mechanisms have been proposed to generate discordance, including differential selection, long-branch attraction, gene duplication, genetic introgression, and/or incomplete lineage sorting. For speciose clades in which divergence has occurred recently and rapidly, recovering the true species tree can be particularly problematic due to incomplete lineage sorting. Unfortunately, the availability of multilocus or “phylogenomic” data sets does not simply solve the problem, particularly when the data are analyzed with standard concatenation techniques. In our study, we conduct a phylogenetic study for a nearly complete species sample of the dwarf and mouse lemur clade, Cheirogaleidae. Mouse lemurs (genus, Microcebus) have been intensively studied over the past decade for reasons relating to their high level of cryptic species diversity, and although there has been emerging consensus regarding the evolutionary diversity contained within the genus, there is no agreement as to the inter-specific relationships within the group. We attempt to resolve cheirogaleid phylogeny, focusing especially on the mouse lemurs, by employing a large multilocus data set. We compare the results of Bayesian concordance methods with those of standard gene concatenation, finding that though concatenation yields the strongest results as measured by statistical support, these results are found to be highly misleading. By employing an approach where individual alleles are treated as operational taxonomic units, we show that phylogenetic results are substantially influenced by the selection of alleles in the concatenation process. PMID:22319174

  20. Random Addition Concatenation Analysis: A Novel Approach to the Exploration of Phylogenomic Signal Reveals Strong Agreement between Core and Shell Genomic Partitions in the Cyanobacteria

    PubMed Central

    Narechania, Apurva; Baker, Richard H.; Sit, Ryan; Kolokotronis, Sergios-Orestis; DeSalle, Rob; Planet, Paul J.

    2012-01-01

    Recent whole-genome approaches to microbial phylogeny have emphasized partitioning genes into functional classes, often focusing on differences between a stable core of genes and a variable shell. To rigorously address the effects of partitioning and combining genes in genome-level analyses, we developed a novel technique called Random Addition Concatenation Analysis (RADICAL). RADICAL operates by sequentially concatenating randomly chosen gene partitions starting with a single-gene partition and ending with the entire genomic data set. A phylogenetic tree is built for every successive addition, and the entire process is repeated creating multiple random concatenation paths. The result is a library of trees representing a large variety of differently sized random gene partitions. This library can then be mined to identify unique topologies, assess overall agreement, and measure support for different trees. To evaluate RADICAL, we used 682 orthologous genes across 13 cyanobacterial genomes. Despite previous assertions of substantial differences between a core and a shell set of genes for this data set, RADICAL reveals the two partitions contain congruent phylogenetic signal. Substantial disagreement within the data set is limited to a few nodes and genes involved in metabolism, a functional group that is distributed evenly between the core and the shell partitions. We highlight numerous examples where RADICAL reveals aspects of phylogenetic behavior not evident by examining individual gene trees or a “‘total evidence” tree. Our method also demonstrates that most emergent phylogenetic signal appears early in the concatenation process. The software is freely available at http://desalle.amnh.org. PMID:22094860

  1. Performance Analysis of a JTIDS/Link-16 Type Waveform using 32-ary Orthogonal Signaling with 32 Chip Baseband Waveforms and a Concatenated Code

    DTIC Science & Technology

    2009-12-01

    ANALYSIS OF A JTIDS /LINK-16 TYPE WAVEFORM USING 32-ARY ORTHOGONAL SIGNALING WITH 32 CHIP BASEBAND WAVEFORMS AND A CONCATENATED CODE by Theodoros...REPORT DATE December 2009 3. REPORT TYPE AND DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE: Performance Analysis of a JTIDS /Link-16-type...A 13. ABSTRACT (maximum 200 words) The Joint Tactical Information Distribution System ( JTIDS ) is a hybrid frequency-hopped, direct

  2. Human speech articulator measurements using low power, 2GHz Homodyne sensors

    SciTech Connect

    Barnes, T; Burnett, G C; Holzrichter, J F

    1999-06-29

    Very low power, short-range microwave ''radar-like'' sensors can measure the motions and vibrations of internal human speech articulators as speech is produced. In these animate (and also in inanimate acoustic systems) microwave sensors can measure vibration information associated with excitation sources and other interfaces. These data, together with the corresponding acoustic data, enable the calculation of system transfer functions. This information appears to be useful for a surprisingly wide range of applications such as speech coding and recognition, speaker or object identification, speech and musical instrument synthesis, noise cancellation, and other applications.

  3. Online Searching Using Speech as a Man/Machine Interface.

    ERIC Educational Resources Information Center

    Peters, B. F.; And Others

    1989-01-01

    Describes the development, implementation, and evaluation of a voice interface for the British Library Blaise Online Information Retrieval System. Results of the evaluation show that the use of currently available speech recognition and synthesis hardware, along with intelligent software, can provide an interface well suited to the needs of online…

  4. Detecting Speech Defects

    ERIC Educational Resources Information Center

    Kryza, Frank T., II

    1976-01-01

    Discusses the importance of early detection of speech defects and briefly describes the activities of the Pre-School Diagnostic Center for Severe Communication Disorders in New Haven, Connecticut. (ED)

  5. Speech impairment (adult)

    MedlinePlus

    ... this page: //medlineplus.gov/ency/article/003204.htm Speech impairment (adult) To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  6. Anxiety and ritualized speech

    ERIC Educational Resources Information Center

    Lalljee, Mansur; Cook, Mark

    1975-01-01

    The experiment examines the effects on a number of words that seem irrelevant to semantic communication. The Units of Ritualized Speech (URSs) considered are: 'I mean', 'in fact', 'really', 'sort of', 'well' and 'you know'. (Editor)

  7. Speech and Communication Disorders

    MedlinePlus

    ... or understand speech. Causes include Hearing disorders and deafness Voice problems, such as dysphonia or those caused ... language therapy can help. NIH: National Institute on Deafness and Other Communication Disorders

  8. Speech perception as categorization

    PubMed Central

    Holt, Lori L.; Lotto, Andrew J.

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition. PMID:20601702

  9. Anxiety and ritualized speech

    ERIC Educational Resources Information Center

    Lalljee, Mansur; Cook, Mark

    1975-01-01

    The experiment examines the effects on a number of words that seem irrelevant to semantic communication. The Units of Ritualized Speech (URSs) considered are: 'I mean', 'in fact', 'really', 'sort of', 'well' and 'you know'. (Editor)

  10. Thai Automatic Speech Recognition

    DTIC Science & Technology

    2005-01-01

    reported elsewhere. 1. Introduction This research was performed as part of the DARPA-Babylon program aimed at rapidly developing multilingual speech-to...used in an external DARPA evaluation involving medical scenarios between an American Doctor and a naïve monolingual Thai patient. 2. Thai Language...To create more general acoustic models we collected read speech data from native speakers based on the concepts of our multilingual data collection

  11. Auditory speech preprocessors

    SciTech Connect

    Zweig, G.

    1989-01-01

    A nonlinear transmission line model of the cochlea (Zweig 1988) is proposed as the basis for a novel speech preprocessor. Sounds of different intensities, such as voiced and unvoiced speech, are preprocessed in radically different ways. The Q's of the preprocessor's nonlinear filters vary with input amplitude, higher Q's (longer integration times) corresponding to quieter sounds. Like the cochlea, the preprocessor acts as a ''subthreshold laser'' that traps and amplifies low level signals, thereby aiding in their detection and analysis. 17 refs.

  12. Speech perception and production

    PubMed Central

    Casserly, Elizabeth D.; Pisoni, David B.

    2012-01-01

    Until recently, research in speech perception and speech production has largely focused on the search for psychological and phonetic evidence of discrete, abstract, context-free symbolic units corresponding to phonological segments or phonemes. Despite this common conceptual goal and intimately related objects of study, however, research in these two domains of speech communication has progressed more or less independently for more than 60 years. In this article, we present an overview of the foundational works and current trends in the two fields, specifically discussing the progress made in both lines of inquiry as well as the basic fundamental issues that neither has been able to resolve satisfactorily so far. We then discuss theoretical models and recent experimental evidence that point to the deep, pervasive connections between speech perception and production. We conclude that although research focusing on each domain individually has been vital in increasing our basic understanding of spoken language processing, the human capacity for speech communication is so complex that gaining a full understanding will not be possible until speech perception and production are conceptually reunited in a joint approach to problems shared by both modes. PMID:23946864

  13. Speech perception and production.

    PubMed

    Casserly, Elizabeth D; Pisoni, David B

    2010-09-01

    Until recently, research in speech perception and speech production has largely focused on the search for psychological and phonetic evidence of discrete, abstract, context-free symbolic units corresponding to phonological segments or phonemes. Despite this common conceptual goal and intimately related objects of study, however, research in these two domains of speech communication has progressed more or less independently for more than 60 years. In this article, we present an overview of the foundational works and current trends in the two fields, specifically discussing the progress made in both lines of inquiry as well as the basic fundamental issues that neither has been able to resolve satisfactorily so far. We then discuss theoretical models and recent experimental evidence that point to the deep, pervasive connections between speech perception and production. We conclude that although research focusing on each domain individually has been vital in increasing our basic understanding of spoken language processing, the human capacity for speech communication is so complex that gaining a full understanding will not be possible until speech perception and production are conceptually reunited in a joint approach to problems shared by both modes. Copyright © 2010 John Wiley & Sons, Ltd. For further resources related to this article, please visit the WIREs website. Copyright © 2010 John Wiley & Sons, Ltd.

  14. Musician advantage for speech-on-speech perception.

    PubMed

    Başkent, Deniz; Gaudrain, Etienne

    2016-03-01

    Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level auditory cognitive functions, such as attention. Indeed, despite the few non-musicians who performed as well as musicians, on a group level, there was a strong musician benefit for speech perception in a speech masker. This benefit does not seem to result from better voice processing and could instead be related to better stream segregation or enhanced cognitive functions.

  15. Robust Speech Rate Estimation for Spontaneous Speech

    PubMed Central

    Wang, Dagen; Narayanan, Shrikanth S.

    2010-01-01

    In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral subband correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database. PMID:20428476

  16. Voice synthesis application

    NASA Astrophysics Data System (ADS)

    Lightstone, P. C.; Davidson, W. M.

    1982-04-01

    The military detection assessment laboratory houses an experimental field system which assesses different alarm indicators such as fence disturbance sensors, MILES cables, and microwave Racons. A speech synthesis board which could be interfaced, by means of a computer, to an alarm logger making verbal acknowledgement of alarms possible was purchased. Different products and different types of voice synthesis were analyzed before a linear predictive code device produced by Telesensory Speech Systems of Palo Alto, California was chosen. This device is called the Speech 1000 Board and has a dedicated 8085 processor. A multiplexer card was designed and the Sp 1000 interfaced through the card into a TMS 990/100M Texas Instrument microcomputer. It was also necessary to design the software with the capability of recognizing and flagging an alarm on any 1 of 32 possible lines. The experimental field system was then packaged with a dc power supply, LED indicators, speakers, and switches, and deployed in the field performing reliably.

  17. Computer-based speech therapy for childhood speech sound disorders.

    PubMed

    Furlong, Lisa; Erickson, Shane; Morris, Meg E

    2017-07-01

    With the current worldwide workforce shortage of Speech-Language Pathologists, new and innovative ways of delivering therapy to children with speech sound disorders are needed. Computer-based speech therapy may be an effective and viable means of addressing service access issues for children with speech sound disorders. To evaluate the efficacy of computer-based speech therapy programs for children with speech sound disorders. Studies reporting the efficacy of computer-based speech therapy programs were identified via a systematic, computerised database search. Key study characteristics, results, main findings and details of computer-based speech therapy programs were extracted. The methodological quality was evaluated using a structured critical appraisal tool. 14 studies were identified and a total of 11 computer-based speech therapy programs were evaluated. The results showed that computer-based speech therapy is associated with positive clinical changes for some children with speech sound disorders. There is a need for collaborative research between computer engineers and clinicians, particularly during the design and development of computer-based speech therapy programs. Evaluation using rigorous experimental designs is required to understand the benefits of computer-based speech therapy. The reader will be able to 1) discuss how computerbased speech therapy has the potential to improve service access for children with speech sound disorders, 2) explain the ways in which computer-based speech therapy programs may enhance traditional tabletop therapy and 3) compare the features of computer-based speech therapy programs designed for different client populations. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Toward the ultimate synthesis/recognition system.

    PubMed

    Furui, S

    1995-10-24

    This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.

  19. Speech Alarms Pilot Study

    NASA Technical Reports Server (NTRS)

    Sandor, A.; Moses, H. R.

    2016-01-01

    Currently on the International Space Station (ISS) and other space vehicles Caution & Warning (C&W) alerts are represented with various auditory tones that correspond to the type of event. This system relies on the crew's ability to remember what each tone represents in a high stress, high workload environment when responding to the alert. Furthermore, crew receive a year or more in advance of the mission that makes remembering the semantic meaning of the alerts more difficult. The current system works for missions conducted close to Earth where ground operators can assist as needed. On long duration missions, however, they will need to work off-nominal events autonomously. There is evidence that speech alarms may be easier and faster to recognize, especially during an off-nominal event. The Information Presentation Directed Research Project (FY07-FY09) funded by the Human Research Program included several studies investigating C&W alerts. The studies evaluated tone alerts currently in use with NASA flight deck displays along with candidate speech alerts. A follow-on study used four types of speech alerts to investigate how quickly various types of auditory alerts with and without a speech component - either at the beginning or at the end of the tone - can be identified. Even though crew were familiar with the tone alert from training or direct mission experience, alerts starting with a speech component were identified faster than alerts starting with a tone. The current study replicated the results from the previous study in a more rigorous experimental design to determine if the candidate speech alarms are ready for transition to operations or if more research is needed. Four types of alarms (caution, warning, fire, and depressurization) were presented to participants in both tone and speech formats in laboratory settings and later in the Human Exploration Research Analog (HERA). In the laboratory study, the alerts were presented by software and participants were

  20. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    ERIC Educational Resources Information Center

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  1. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    ERIC Educational Resources Information Center

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  2. Resolving ambiguity of species limits and concatenation in multilocus sequence data for the construction of phylogenetic supermatrices.

    PubMed

    Chesters, Douglas; Vogler, Alfried P

    2013-05-01

    Public DNA databases are becoming too large and too complex for manual methods to generate phylogenetic supermatrices from multiple gene sequences. Delineating the terminals based on taxonomic labels is no longer practical because species identifications are frequently incomplete and gene trees are incongruent with Linnaean binomials, which results in uncertainty about how to combine species units among unlinked loci. We developed a procedure that minimizes the problem of forming multilocus species units in a large phylogenetic data set using algorithms from graph theory. An initial step established sequence clusters for each locus that broadly correspond to the species level. These clusters frequently include sequences labeled with various binomials and specimen identifiers that create multiple alternatives for concatenation. To choose among these possibilities, we minimize taxonomic conflict among the species units globally in the data set using a multipartite heuristic algorithm. The procedure was applied to all available GenBank data for Coleoptera (beetles) including > 10 500 taxon labels and > 23 500 sequences of 4 loci, which were grouped into 11 241 clusters or divergent singletons by the BlastClust software. Within each cluster, unidentified sequences could be assigned to a species name through the association with fully identified sequences, resulting in 510 new identifications (13.9% of total unidentified sequences) of which nearly half were "trans-locus" identifications by clustering of sequences at a secondary locus. The limits of DNA-based clusters were inconsistent with the Linnaean binomials for 1518 clusters (13.5%) that contained more than one binomial or split a single binomial among multiple clusters. By applying a scoring scheme for full and partial name matches in pairs of clusters, a maximum weight set of 7366 global species units was produced. Varying the match weights for partial matches had little effect on the number of units, although if

  3. Hearing or speech impairment - resources

    MedlinePlus

    Resources - hearing or speech impairment ... The following organizations are good resources for information on hearing impairment or speech impairment: Alexander Graham Bell Association for the Deaf and Hard of Hearing -- www.agbell. ...

  4. Delayed Speech or Language Development

    MedlinePlus

    ... Creating a Reader-Friendly Home Auditory Processing Disorder Stuttering Reading Milestones Speech-Language Therapy Hearing Evaluation in ... 5-Year-Old Going to a Speech Therapist Stuttering Contact Us Print Resources Send to a Friend ...

  5. Development of a speech autocuer

    NASA Technical Reports Server (NTRS)

    Bedles, R. L.; Kizakvich, P. N.; Lawson, D. T.; Mccartney, M. L.

    1980-01-01

    A wearable, visually based prosthesis for the deaf based upon the proven method for removing lipreading ambiguity known as cued speech was fabricated and tested. Both software and hardware developments are described, including a microcomputer, display, and speech preprocessor.

  6. Evaluating syntactic constraints to speech recognition in a fighter aircraft environment

    NASA Astrophysics Data System (ADS)

    Stockton, D. B.

    1984-12-01

    A flexible software system has been developed to test the effects of adding syntactic knowledge to an isolated speech phoneme-based word recognizer. Words from seventy-word fighter plane vocabulary, spoken by five pilots at four different levels of background noise, are automatically concatenated into commands randomly chosen from a set of over seven trillion. These commands are then recognized using an existing word recognizer together with grammars of differing specificity. Results are compiled automatically. The system is flexible in that system components such as the command generator, parser, grammar, or word recognizer can be interchanged with very little software modification. Preliminary testing demonstrated that, although the modified word recognizer exhibited very poor performance, the use of more specific grammars enhanced recognition accuracy, sometimes drastically.

  7. Speech spectrogram expert

    SciTech Connect

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  8. Study of acoustic correlates associate with emotional speech

    NASA Astrophysics Data System (ADS)

    Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

    2004-10-01

    This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

  9. Speech Cues and Sign Stimuli.

    ERIC Educational Resources Information Center

    Mattingly, Ignatius G.

    Parallels between sign stimuli and speech cues suggest some interesting speculations about the origins of language. Speech cues may belong to the class of human sign stimuli which, as in animal behavior, may be the product of an innate releasing mechanism. Prelinguistic speech for man may have functioned as a social-releaser system. Human language…

  10. ADMINISTRATIVE GUIDE IN SPEECH CORRECTION.

    ERIC Educational Resources Information Center

    HEALEY, WILLIAM C.

    WRITTEN PRIMARILY FOR SCHOOL SUPERINTENDENTS, PRINCIPALS, SPEECH CLINICIANS, AND SUPERVISORS, THIS GUIDE OUTLINES THE MECHANICS OF ORGANIZING AND CONDUCTING SPEECH CORRECTION ACTIVITIES IN THE PUBLIC SCHOOLS. IT INCLUDES THE REQUIREMENTS FOR CERTIFICATION OF A SPEECH CLINICIAN IN MISSOURI AND DESCRIBES ESSENTIAL STEPS FOR THE DEVELOPMENT OF A…

  11. "Zero Tolerance" for Free Speech.

    ERIC Educational Resources Information Center

    Hils, Lynda

    2001-01-01

    Argues that school policies of "zero tolerance" of threatening speech may violate a student's First Amendment right to freedom of expression if speech is less than a "true threat." Suggests a two-step analysis to determine if student speech is a "true threat." (PKP)

  12. Signed Soliloquy: Visible Private Speech

    ERIC Educational Resources Information Center

    Zimmermann, Kathrin; Brugger, Peter

    2013-01-01

    Talking to oneself can be silent (inner speech) or vocalized for others to hear (private speech, or soliloquy). We investigated these two types of self-communication in 28 deaf signers and 28 hearing adults. With a questionnaire specifically developed for this study, we established the visible analog of vocalized private speech in deaf signers.…

  13. Abortion and compelled physician speech.

    PubMed

    Orentlicher, David

    2015-01-01

    Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading.

  14. Multilocus phylogeny of the lichen-forming fungal genus Melanohalea (Parmeliaceae, Ascomycota): insights on diversity, distributions, and a comparison of species tree and concatenated topologies.

    PubMed

    Leavitt, Steven D; Esslinger, Theodore L; Spribille, Toby; Divakar, Pradeep K; Thorsten Lumbsch, H

    2013-01-01

    Accurate species circumscriptions are central for many biological disciplines and have critical implications for ecological and conservation studies. An increasing body of evidence suggests that in some cases traditional morphology-based taxonomy have underestimated diversity in lichen-forming fungi. Therefore, genetic data play an increasing role for recognizing distinct lineages of lichenized fungi that it would otherwise be improbable to recognize using classical phenotypic characters. Melanohalea (Parmeliaceae, Ascomycota) is one of the most widespread and common lichen-forming genera in the northern Hemisphere. In this study, we assess traditional phenotype-based species boundaries, identify previously unrecognized species-level lineages and discuss biogeographic patterns in Melanohalea. We sampled 487 individuals worldwide, representing 18 of the 22 described Melanohalea species, and generated DNA sequence data from mitochondrial, nuclear ribosomal, and protein-coding markers. Diversity previously hidden within traditional species was identified using a genealogical concordance approach. We inferred relationships among sampled species-level lineages within Melanohalea using both concatenated phylogenetic methods and a coalescent-based multilocus species tree approach. Although lineages identified from genetic data are largely congruent with traditional taxonomy, we found strong evidence supporting the presence of previously unrecognized species in six of the 18 sampled taxa. Strong nodal support and overall congruence among independent loci suggest long-term reproductive isolation among most species-level lineages. While some Melanohalea taxa are truly widespread, a limited number of clades appear to have much more restricted distributional ranges. In most instances the concatenated gene tree and multilocus species tree approaches provided similar estimates of relationships. However, nodal support was generally higher in the phylogeny estimated from

  15. Corpus of deaf speech for acoustic and speech production research.

    PubMed

    Mendel, Lisa Lucks; Lee, Sungmin; Pousson, Monique; Patro, Chhayakanta; McSorley, Skylar; Banerjee, Bonny; Najnin, Shamima; Kapourchali, Masoumeh Heidari

    2017-07-01

    A corpus of recordings of deaf speech is introduced. Adults who were pre- or post-lingually deafened as well as those with normal hearing read standardized speech passages totaling 11 h of .wav recordings. Preliminary acoustic analyses are included to provide a glimpse of the kinds of analyses that can be conducted with this corpus of recordings. Long term average speech spectra as well as spectral moment analyses provide considerable insight into differences observed in the speech of talkers judged to have low, medium, or high speech intelligibility.

  16. Speech audiometry, speech perception, and cognitive functions : English version.

    PubMed

    Meister, H

    2017-01-01

    Examination of cognitive functions in the framework of speech perception has recently gained increasing scientific and clinical interest. Especially against the background of age-related hearing impairment and cognitive decline, potential new perspectives in terms of a better individualization of auditory diagnosis and rehabilitation might arise. This review addresses the relationships between speech audiometry, speech perception, and cognitive functions. It presents models of speech perception, discusses associations of neuropsychological and audiometric outcomes, and shows examples of recent efforts undertaken in Germany to consider cognitive functions with speech audiometry.

  17. Speech transmission index from running speech: A neural network approach

    NASA Astrophysics Data System (ADS)

    Li, F. F.; Cox, T. J.

    2003-04-01

    Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

  18. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants

    PubMed Central

    Chen, Ke Heng; Small, Susan A.

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability. PMID:26798343

  19. Elicitation of the Acoustic Change Complex to Long-Duration Speech Stimuli in Four-Month-Old Infants.

    PubMed

    Chen, Ke Heng; Small, Susan A

    2015-01-01

    The acoustic change complex (ACC) is an auditory-evoked potential elicited to changes within an ongoing stimulus that indicates discrimination at the level of the auditory cortex. Only a few studies to date have attempted to record ACCs in young infants. The purpose of the present study was to investigate the elicitation of ACCs to long-duration speech stimuli in English-learning 4-month-old infants. ACCs were elicited to consonant contrasts made up of two concatenated speech tokens. The stimuli included native dental-dental /dada/ and dental-labial /daba/ contrasts and a nonnative Hindi dental-retroflex /daDa/ contrast. Each consonant-vowel speech token was 410 ms in duration. Slow cortical responses were recorded to the onset of the stimulus and to the acoustic change from /da/ to either /ba/ or /Da/ within the stimulus with significantly prolonged latencies compared with adults. ACCs were reliably elicited for all stimulus conditions with more robust morphology compared with our previous findings using stimuli that were shorter in duration. The P1 amplitudes elicited to the acoustic change in /daba/ and /daDa/ were significantly larger compared to /dada/ supporting that the brain discriminated between the speech tokens. These findings provide further evidence for the use of ACCs as an index of discrimination ability.

  20. Perceptual Learning in Speech

    ERIC Educational Resources Information Center

    Norris, Dennis; McQueen, James M.; Cutler, Anne

    2003-01-01

    This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listeners heard ambiguous [f]-final words (e.g.,…

  1. Speech disfluency in centenarians.

    PubMed

    Searl, Jeffrey P; Gabel, Rodney M; Fulks, J Steven

    2002-01-01

    Other than a single case presentation of a 105-year-old female, no other studies have addressed the speech fluency characteristics of centenarians. The purpose of this study was to provide descriptive information on the fluency characteristics of speakers between the ages of 100-103 years. Conversational speech samples from seven speakers were evaluated for the frequency and types of disfluencies and speech rate. The centenarian speakers had a disfluency rate similar to that reported for 70-, 80-, and early 90-year-olds. The types of disfluencies observed also were similar to those reported for younger elderly speakers (primarily whole word/phrase, or formulative fluency breaks). Finally, the speech rate data for the current group of speakers supports prior literature reports of a slower rate with advancing age, but extends the finding to centenarians. As a result of this activity, participants will be able to: (1) describe the frequency of disfluency breaks and the types of disfluencies exhibited by centenarian speakers, (2) describe the mean and range of speaking rates in centenarians, and (3) compare the present findings for centenarians to the fluency and speaking rate characteristics reported in the literature.

  2. Speech to schoolchildren

    NASA Astrophysics Data System (ADS)

    Angell, C. Austen

    2013-02-01

    Prof. C. A. Angell from Arizona State University read the following short and simple speech, saying the sentences in Italics in the best Japanese he could manage (after earnest coaching from a Japanese colleague). The rest was translated on the bus ride, and then spoken, as I spoke, by Ms. Yukako Endo- to whom the author is very grateful.

  3. Speech Understanding Systems

    DTIC Science & Technology

    1975-03-01

    insensitive to random occurrences of noise. 3) It is capable of being extended to handle large vocabularies. 4) It oermits alternate...baseforms, phonological rules, and marking of syllable boundaries and stress levels from the Speech Communications Research Laboratory , We also

  4. Recognition of speech spectrograms.

    PubMed

    Greene, B G; Pisoni, D B; Carrell, T D

    1984-07-01

    The performance of eight naive observers in learning to identify speech spectrograms was studied over a 2-month period. Single tokens from a 50-word phonetically balanced (PB) list were recorded by several talkers and displayed on a Spectraphonics Speech Spectrographic Display system. Identification testing occurred immediately after daily training sessions. After approximately 20 h of training, naive subjects correctly identified the 50 PB words from a single talker over 95% of the time. Generalization tests with the same words were then carried out with different tokens from the original talker, new tokens from another male talker, a female talker, and finally, a synthetic talker. The generalization results for these talkers showed recognition performance at 91%, 76%, 76%, and 48%, respectively. Finally, generalization tests with a novel set of PB words produced by the original talker were also carried out to examine in detail the perceptual strategies and visual features that subjects abstracted from the training set. Our results demonstrate that even without formal training in phonetics or acoustics naive observers can learn to identify visual displays of speech at very high levels of accuracy. Analysis of subjects' performance in a verbal protocol task demonstrated that they rely on salient visual correlates of many phonetic features in speech.

  5. Hearing speech in music.

    PubMed

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (P<.01). Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01) and SPN (P<.05). Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01), but there were smaller differences between masking conditions (P<.01). It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  6. Black History Speech

    ERIC Educational Resources Information Center

    Noldon, Carl

    2007-01-01

    The author argues in this speech that one cannot expect students in the school system to know and understand the genius of Black history if the curriculum is Eurocentric, which is a residue of racism. He states that his comments are designed for the enlightenment of those who suffer from a school system that "hypocritically manipulates Black…

  7. Mandarin Visual Speech Information

    ERIC Educational Resources Information Center

    Chen, Trevor H.

    2010-01-01

    While the auditory-only aspects of Mandarin speech are heavily-researched and well-known in the field, this dissertation addresses its lesser-known aspects: The visual and audio-visual perception of Mandarin segmental information and lexical-tone information. Chapter II of this dissertation focuses on the audiovisual perception of Mandarin…

  8. Speech intelligibility in hospitals.

    PubMed

    Ryherd, Erica E; Moeller, Michael; Hsu, Timothy

    2013-07-01

    Effective communication between staff members is key to patient safety in hospitals. A variety of patient care activities including admittance, evaluation, and treatment rely on oral communication. Surprisingly, published information on speech intelligibility in hospitals is extremely limited. In this study, speech intelligibility measurements and occupant evaluations were conducted in 20 units of five different U.S. hospitals. A variety of unit types and locations were studied. Results show that overall, no unit had "good" intelligibility based on the speech intelligibility index (SII > 0.75) and several locations found to have "poor" intelligibility (SII < 0.45). Further, occupied spaces were found to have 10%-15% lower SII than unoccupied spaces on average. Additionally, staff perception of communication problems at nurse stations was significantly correlated with SII ratings. In a targeted second phase, a unit treated with sound absorption had higher SII ratings for a larger percentage of time as compared to an identical untreated unit. Taken as a whole, the study provides an extensive baseline evaluation of speech intelligibility across a variety of hospitals and unit types, offers some evidence of the positive impact of absorption on intelligibility, and identifies areas for future research.

  9. Free Speech Yearbook 1973.

    ERIC Educational Resources Information Center

    Barbour, Alton, Ed.

    The first article in this collection examines civil disobedience and the protections offered by the First Amendment. The second article discusses a study on antagonistic expressions in a free society. The third essay deals with attitudes toward free speech and treatment of the United States flag. There are two articles on media; the first examines…

  10. Free Speech Yearbook 1979.

    ERIC Educational Resources Information Center

    Kane, Peter E., Ed.

    The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…

  11. Speech After Banquet

    NASA Astrophysics Data System (ADS)

    Yang, Chen Ning

    2013-05-01

    I am usually not so short of words, but the previous speeches have rendered me really speechless. I have known and admired the eloquence of Freeman Dyson, but I did not know that there is a hidden eloquence in my colleague George Sterman...

  12. Free Speech Yearbook 1979.

    ERIC Educational Resources Information Center

    Kane, Peter E., Ed.

    The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…

  13. The Commercial Speech Doctrine.

    ERIC Educational Resources Information Center

    Luebke, Barbara F.

    In its 1942 ruling in the "Valentine vs. Christensen" case, the Supreme Court established the doctrine that commercial speech is not protected by the First Amendment. In 1975, in the "Bigelow vs. Virginia" case, the Supreme Court took a decisive step toward abrogating that doctrine, by ruling that advertising is not stripped of…

  14. On Curbing Racial Speech.

    ERIC Educational Resources Information Center

    Gale, Mary Ellen

    1991-01-01

    An alternative interpretation of the First Amendment guarantee of free speech suggests that universities may prohibit and punish direct verbal assaults on specific individuals if the speaker intends to do harm and if a reasonable person would recognize the potential for serious interference with the victim's educational rights. (MSE)

  15. Mandarin Visual Speech Information

    ERIC Educational Resources Information Center

    Chen, Trevor H.

    2010-01-01

    While the auditory-only aspects of Mandarin speech are heavily-researched and well-known in the field, this dissertation addresses its lesser-known aspects: The visual and audio-visual perception of Mandarin segmental information and lexical-tone information. Chapter II of this dissertation focuses on the audiovisual perception of Mandarin…

  16. Forensics and Speech Communication

    ERIC Educational Resources Information Center

    McBath, James H.

    1975-01-01

    Focuses on the importance of integrating forensics programs into the speech communication curriculum. Maintains that debating and argumentation skills increase the probability of academic success. Published by the Association for Communication Administration Bulletin, Staff Coordinator, ACA 5205 Leesburg Pike, Falls Church, VA 22041, $25.00 annual…

  17. Media Criticism Group Speech

    ERIC Educational Resources Information Center

    Ramsey, E. Michele

    2004-01-01

    Objective: To integrate speaking practice with rhetorical theory. Type of speech: Persuasive. Point value: 100 points (i.e., 30 points based on peer evaluations, 30 points based on individual performance, 40 points based on the group presentation), which is 25% of course grade. Requirements: (a) References: 7-10; (b) Length: 20-30 minutes; (c)…

  18. Expectations and speech intelligibility.

    PubMed

    Babel, Molly; Russell, Jamie

    2015-05-01

    Socio-indexical cues and paralinguistic information are often beneficial to speech processing as this information assists listeners in parsing the speech stream. Associations that particular populations speak in a certain speech style can, however, make it such that socio-indexical cues have a cost. In this study, native speakers of Canadian English who identify as Chinese Canadian and White Canadian read sentences that were presented to listeners in noise. Half of the sentences were presented with a visual-prime in the form of a photo of the speaker and half were presented in control trials with fixation crosses. Sentences produced by Chinese Canadians showed an intelligibility cost in the face-prime condition, whereas sentences produced by White Canadians did not. In an accentedness rating task, listeners rated White Canadians as less accented in the face-prime trials, but Chinese Canadians showed no such change in perceived accentedness. These results suggest a misalignment between an expected and an observed speech signal for the face-prime trials, which indicates that social information about a speaker can trigger linguistic associations that come with processing benefits and costs.

  19. Free Speech Yearbook, 1974.

    ERIC Educational Resources Information Center

    Barbour, Alton, Ed.

    A collection of essays on free speech and communication is contained in this book. The essays include "From Fairness to Access and Back Again: Some Dimensions of Free Expression in Broadcasting"; "Local Option on the First Amendment?"; "A Look at the Fire Symbol Before and After May 4, 1970"; "Freedom to Teach,…

  20. AFFECTIVE COMMUNICATION IN SPEECH AND RELATED QUANTITATIVE PROBLEMS.

    DTIC Science & Technology

    VOICE COMMUNICATIONS, PSYCHOACOUSTICS), (*PSYCHOACOUSTICS, SPEECH ), (* SPEECH , INTELLIGIBILITY), (*INTELLIGIBILITY, MEASUREMENT), SPEECH ...TRANSMISSION, ACOUSTIC PROPERTIES, AUDITORY PERCEPTION, DISTORTION, PSYCHOLOGICAL TESTS, PERCEPTION(PSYCHOLOGY), SPEECH REPRESENTATION, MATHEMATICAL ANALYSIS, TABLES(DATA)

  1. Conversation, speech acts, and memory.

    PubMed

    Holtgraves, Thomas

    2008-03-01

    Speakers frequently have specific intentions that they want others to recognize (Grice, 1957). These specific intentions can be viewed as speech acts (Searle, 1969), and I argue that they play a role in long-term memory for conversation utterances. Five experiments were conducted to examine this idea. Participants in all experiments read scenarios ending with either a target utterance that performed a specific speech act (brag, beg, etc.) or a carefully matched control. Participants were more likely to falsely recall and recognize speech act verbs after having read the speech act version than after having read the control version, and the speech act verbs served as better recall cues for the speech act utterances than for the controls. Experiment 5 documented individual differences in the encoding of speech act verbs. The results suggest that people recognize and retain the actions that people perform with their utterances and that this is one of the organizing principles of conversation memory.

  2. Relationship between speech motor control and speech intelligibility in children with speech sound disorders.

    PubMed

    Namasivayam, Aravind Kumar; Pukonen, Margit; Goshulak, Debra; Yu, Vickie Y; Kadis, Darren S; Kroll, Robert; Pang, Elizabeth W; De Nil, Luc F

    2013-01-01

    The current study was undertaken to investigate the impact of speech motor issues on the speech intelligibility of children with moderate to severe speech sound disorders (SSD) within the context of the PROMPT intervention approach. The word-level Children's Speech Intelligibility Measure (CSIM), the sentence-level Beginner's Intelligibility Test (BIT) and tests of speech motor control and articulation proficiency were administered to 12 children (3:11 to 6:7 years) before and after PROMPT therapy. PROMPT treatment was provided for 45 min twice a week for 8 weeks. Twenty-four naïve adult listeners aged 22-46 years judged the intelligibility of the words and sentences. For CSIM, each time a recorded word was played to the listeners they were asked to look at a list of 12 words (multiple-choice format) and circle the word while for BIT sentences, the listeners were asked to write down everything they heard. Words correctly circled (CSIM) or transcribed (BIT) were averaged across three naïve judges to calculate percentage speech intelligibility. Speech intelligibility at both the word and sentence level was significantly correlated with speech motor control, but not articulatory proficiency. Further, the severity of speech motor planning and sequencing issues may potentially be a limiting factor in connected speech intelligibility and highlights the need to target these issues early and directly in treatment. The reader will be able to: (1) outline the advantages and disadvantages of using word- and sentence-level speech intelligibility tests; (2) describe the impact of speech motor control and articulatory proficiency on speech intelligibility; and (3) describe how speech motor control and speech intelligibility data may provide critical information to aid treatment planning. Copyright © 2013 Elsevier Inc. All rights reserved.

  3. Seeking a reading machine for the blind and discovering the speech code.

    PubMed

    Shankweiler, Donald; Fowler, Carol A

    2015-02-01

    A machine that can read printed material to the blind became a priority at the end of World War II with the appointment of a U.S. Government committee to instigate research on sensory aids to improve the lot of blinded veterans. The committee chose Haskins Laboratories to lead a multisite research program. Initially, Haskins researchers overestimated the capacities of users to learn an acoustic code based on the letters of a text, resulting in unsuitable designs. Progress was slow because the researchers clung to a mistaken view that speech is a sound alphabet and because of persisting gaps in man-machine technology. The tortuous route to a practical reading machine transformed the scientific understanding of speech perception and reading at Haskins Labs and elsewhere, leading to novel lines of basic research and new technologies. Research at Haskins Laboratories made valuable contributions in clarifying the physical basis of speech. Researchers recognized that coarticulatory overlap eliminated the possibility of alphabet-like discrete acoustic segments in speech. This work advanced the study of speech perception and contributed to our understanding of the relation of speech perception to production. Basic findings on speech enabled the development of speech synthesis, part science and part technology, essential for development of a reading machine, which has found many applications. Findings on the nature of speech further stimulated a new understanding of word recognition in reading across languages and scripts and contributed to our understanding of reading development and reading disabilities.

  4. Speech outcomes of a prolonged-speech treatment for stuttering.

    PubMed

    Onslow, M; Costa, L; Andrews, C; Harrison, E; Packman, A

    1996-08-01

    It has been shown that people who stutter can speak with greatly reduced stuttering after treatments that use variations of Goldiamond's (1965) prolonged-speech (PS). However, outcome research to date has not taken account of several important issues. In particular, speech outcome measures in that research have been insufficient to show that lasting relief from stuttering has been achieved by clients outside the clinic for meaningful periods. The present study used extensive speech outcome measures across a variety of situations in evaluating the outcome of an intensive PS treatment (Ingham, 1987). The speech of 12 clients in this treatment was assessed on three occasions prior to treatment and frequently-on eight occasions-after discharge from the residential setting. For 7 clients, a further assessment occurred at 3 years posttreatment. Concurrent dependent measures were percent syllables stuttered, syllables per minute, and speech naturalness. The dependent measures were collected in many speaking situations within and beyond the clinic. Dependent measures were based on speech samples of substantive duration, and covert assessments were included in the study. Detailed data were presented for individual subjects. Results showed that 12 subjects who remained with the entire 2-3-year program achieved zero or near-zero stuttering. The majority of subjects did not show a regression trend in %SS or speech naturalness scores during the posttreatment period, either within or beyond the clinic. Some subjects showed higher posttreatment %SS scores during covert assessment than during overt assessment. Results also showed that stuttering was eliminated without using unusually slow and unnatural speech patterns. This treatment program does not specify a target speech rate range, and many clients maintained stutter-free speech using speech rates that were higher than the range typically specified in intensive PS programs. A significant correlation was found between speech

  5. Nanoparticle Motion in Entangled Melts of Linear and Nonconcatenated Ring Polymers [Nanoparticle Motion in Entangled Melts of Non-Concatenated Ring Polymers].

    DOE PAGES

    Ge, Ting; Kalathi, Jagannathan T.; Halverson, Jonathan D.; ...

    2017-02-13

    The motion of nanoparticles (NPs) in entangled melts of linear polymers and non-concatenated ring polymers are compared by large-scale molecular dynamics simulations. The comparison provides a paradigm for the effects of polymer architecture on the dynamical coupling between NPs and polymers in nanocomposites. Strongly suppressed motion of NPs with diameter d larger than the entanglement spacing a is observed in a melt of linear polymers before the onset of Fickian NP diffusion. This strong suppression of NP motion occurs progressively as d exceeds a, and is related to the hopping diffusion of NPs in the entanglement network. In contrast tomore » the NP motion in linear polymers, the motion of NPs with d > a in ring polymers is not as strongly suppressed prior to Fickian diffusion. The diffusion coefficient D decreases with increasing d much slower in entangled rings than in entangled linear chains. NP motion in entangled non-concatenated ring polymers is understood through a scaling analysis of the coupling between NP motion and the self-similar entangled dynamics of ring polymers.« less

  6. A Java speech implementation of the Mini Mental Status Exam.

    PubMed Central

    Wang, S. S.; Starren, J.

    1999-01-01

    The Folstein Mini Mental Status Exam (MMSE) is a simple, widely used, verbally administered test to assess cognitive function. The Java Speech Application Programming Interface (JSAPI) is a new, cross-platform interface for both speech recognition and speech synthesis in the Java environment. To evaluate the suitability of the JSAPI for interactive, patient interview applications, a JSAPI implementation of the MMSE was developed. The MMSE contains questions that vary in structure in order to assess different cognitive functions. This question variability provided an excellent test-bed to evaluate the strengths and weaknesses of JSAPI. The application is based on Java platform 2 and a JSAPI interface to the IBM ViaVoice recognition engine. Design and implementations issues are discussed. Preliminary usability studies demonstrate that an automated MMSE maybe a useful screening tool for cognitive disorders and changes. Images Figure 2 Figure 3 Figure 4 PMID:10566396

  7. The feasibility of miniaturizing the versatile portable speech prosthesis: A market survey of commercial products

    NASA Technical Reports Server (NTRS)

    Walklet, T.

    1981-01-01

    The feasibility of a miniature versatile portable speech prosthesis (VPSP) was analyzed and information on its potential users and on other similar devices was collected. The VPSP is a device that incorporates speech synthesis technology. The objective is to provide sufficient information to decide whether there is valuable technology to contribute to the miniaturization of the VPSP. The needs of potential users are identified, the development status of technologies similar or related to those used in the VPSP are evaluated. The VPSP, a computer based speech synthesis system fits on a wheelchair. The purpose was to produce a device that provides communication assistance in educational, vocational, and social situations to speech impaired individuals. It is expected that the VPSP can be a valuable aid for persons who are also motor impaired, which explains the placement of the system on a wheelchair.

  8. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    ERIC Educational Resources Information Center

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  9. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  10. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    ERIC Educational Resources Information Center

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  11. Two Methods of Mechanical Noise Reduction of Recorded Speech During Phonation in an MRI device

    NASA Astrophysics Data System (ADS)

    Přibil, J.; Horáček, J.; Horák, P.

    2011-01-01

    The paper presents two methods of noise reduction of speech signal recorded in an MRI device during phonation for the human vocal tract modelling. The applied approach of noise speech signal cleaning is based on cepstral speech analysis and synthesis because the noise is mainly produced by gradient coils, has a mechanical character, and can be processed in spectral domain. Our first noise reduction method is using real cepstrum limitation and clipping the "peaks" corresponding to the harmonic frequencies of mechanical noise. The second method is coming out from substation of the short-time spectra of two signals recorded withal: the first includes speech and noise, and the second consists of noise only. The resulting speech quality was compared by spectrogram and mean periodogram methods.

  12. Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces

    PubMed Central

    Bocquelet, Florent; Hueber, Thomas; Girin, Laurent; Savariaux, Christophe; Yvert, Blaise

    2016-01-01

    Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer. PMID:27880768

  13. Influence of mothers' slower speech on their children's speech rate.

    PubMed

    Guitar, B; Marchinkoski, L

    2001-08-01

    This study investigated the effects on children's speech rate when their mothers talked more slowly. Six mothers and their normally speaking 3-year-olds (3 girls and 3 boys) were studied using single-subject A-B-A-B designs. Conversational speech rates of mothers were reduced by approximately half in the experimental (B) conditions. Five of the six children appeared to reduce their speech rates when their mothers spoke more slowly. This was confirmed by paired t tests (p < .05) that showed significant decreases in the 5 children's speech rate over the two B conditions. These findings suggest that when mothers substantially decrease their speech rates in a controlled situation, their children also decrease their speech rates. Clinical implications are discussed.

  14. Hate Speech: Power in the Marketplace.

    ERIC Educational Resources Information Center

    Harrison, Jack B.

    1994-01-01

    A discussion of hate speech and freedom of speech on college campuses examines the difference between hate speech from normal, objectionable interpersonal comments and looks at Supreme Court decisions on the limits of student free speech. Two cases specifically concerning regulation of hate speech on campus are considered: Chaplinsky v. New…

  15. TEACHER'S GUIDE TO HIGH SCHOOL SPEECH.

    ERIC Educational Resources Information Center

    JENKINSON, EDWARD B., ED.

    THIS GUIDE TO HIGH SCHOOL SPEECH FOCUSES ON SPEECH AS ORAL COMPOSITION, STRESSING THE IMPORTANCE OF CLEAR THINKING AND COMMUNICATION. THE PROPOSED 1-SEMESTER BASIC COURSE IN SPEECH ATTEMPTS TO IMPROVE THE STUDENT'S ABILITY TO COMPOSE AND DELIVER SPEECHES, TO THINK AND LISTEN CRITICALLY, AND TO UNDERSTAND THE SOCIAL FUNCTION OF SPEECH. IN ADDITION…

  16. TEACHER'S GUIDE TO HIGH SCHOOL SPEECH.

    ERIC Educational Resources Information Center

    JENKINSON, EDWARD B., ED.

    THIS GUIDE TO HIGH SCHOOL SPEECH FOCUSES ON SPEECH AS ORAL COMPOSITION, STRESSING THE IMPORTANCE OF CLEAR THINKING AND COMMUNICATION. THE PROPOSED 1-SEMESTER BASIC COURSE IN SPEECH ATTEMPTS TO IMPROVE THE STUDENT'S ABILITY TO COMPOSE AND DELIVER SPEECHES, TO THINK AND LISTEN CRITICALLY, AND TO UNDERSTAND THE SOCIAL FUNCTION OF SPEECH. IN ADDITION…

  17. Clustering of Context Dependent Speech Units for Multilingual Speech Recognition

    DTIC Science & Technology

    2000-08-01

    Multilingual Speech Recognition DISTRIBUTION: Approved for public release, distribution unlimited This paper is part of the following report: TITLE: Multi... MULTILINGUAL SPEECH RECOGNITION Bojan Imperl University of Maribor Smetanova 17, 2000 Maribor, SLOVENIA E-mail: bojan.imperl@uni-mb.si the... multilingual phonetic inventory, consisting of ABSTRACT language-dependent and language-independent speech units, was defined using the data-driven The paper

  18. Musical intervals in speech

    PubMed Central

    Ross, Deborah; Choi, Jonathan; Purves, Dale

    2007-01-01

    Throughout history and across cultures, humans have created music using pitch intervals that divide octaves into the 12 tones of the chromatic scale. Why these specific intervals in music are preferred, however, is not known. In the present study, we analyzed a database of individually spoken English vowel phones to examine the hypothesis that musical intervals arise from the relationships of the formants in speech spectra that determine the perceptions of distinct vowels. Expressed as ratios, the frequency relationships of the first two formants in vowel phones represent all 12 intervals of the chromatic scale. Were the formants to fall outside the ranges found in the human voice, their relationships would generate either a less complete or a more dilute representation of these specific intervals. These results imply that human preference for the intervals of the chromatic scale arises from experience with the way speech formants modulate laryngeal harmonics to create different phonemes. PMID:17525146

  19. Musical intervals in speech.

    PubMed

    Ross, Deborah; Choi, Jonathan; Purves, Dale

    2007-06-05

    Throughout history and across cultures, humans have created music using pitch intervals that divide octaves into the 12 tones of the chromatic scale. Why these specific intervals in music are preferred, however, is not known. In the present study, we analyzed a database of individually spoken English vowel phones to examine the hypothesis that musical intervals arise from the relationships of the formants in speech spectra that determine the perceptions of distinct vowels. Expressed as ratios, the frequency relationships of the first two formants in vowel phones represent all 12 intervals of the chromatic scale. Were the formants to fall outside the ranges found in the human voice, their relationships would generate either a less complete or a more dilute representation of these specific intervals. These results imply that human preference for the intervals of the chromatic scale arises from experience with the way speech formants modulate laryngeal harmonics to create different phonemes.

  20. Headphone localization of speech

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1993-01-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with nonindividualized HRTFs. About half of the subjects 'pulled' their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15 to 46 percent of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  1. Headphone localization of speech

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1993-01-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with nonindividualized HRTFs. About half of the subjects 'pulled' their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15 to 46 percent of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  2. Headphone localization of speech.

    PubMed

    Begault, D R; Wenzel, E M

    1993-06-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with non-individualized HRTFs. About half of the subjects "pulled" their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15% to 46% of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  3. Neurophysiology of speech differences in childhood apraxia of speech.

    PubMed

    Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.

  4. Neurophysiology of Speech Differences in Childhood Apraxia of Speech

    PubMed Central

    Preston, Jonathan L.; Molfese, Peter J.; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes. PMID:25090016

  5. [Improving speech comprehension using a new cochlear implant speech processor].

    PubMed

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  6. Hiding Information under Speech

    DTIC Science & Technology

    2005-12-12

    as it arrives in real time, and it disappears as fast as it arrives. Furthermore, our cognitive process for translating audio sounds to the meaning... steganography , whose goal is to make the embedded data completely undetectable. In addi- tion, we must dismiss the idea of hiding data by using any...therefore, an image has more room to hide data; and (2) speech steganography has not led to many money-making commercial businesses. For these two

  7. Speech Quality Measurement

    DTIC Science & Technology

    1977-06-10

    noise test , t=2 for t1-v low p’ass f lit er te st ,and t 3 * or theit ADP(NI cod ing tevst ’*s is the sub lec nube 0l e tet Bostz- Av L b U0...a 1ý...it aepa rate, speech clu.1 t laboratory and controlled by the NOVA 830 computoer . Bach of tho stations has a CRT, .15 response buttons, a "rad button

  8. Speech Understanding Research

    DTIC Science & Technology

    1976-10-01

    ELEMENT PARITY XII-10 C. THE ENVIRONMENT TREE XII-14 D. THE EXECUTIVE FOR THE DEDUCTIVE COMPONENT .... XII-18 E. GENERATING CANDIDATE BINDINGS FOR A...SELECTED QVISTA ELEMENT XII-22 F. RAMIFICATIONS OF A PROPOSED BINDING XII-23 G. THE BINDER XII-26 H. DERIVING ELEMENT-OF AND SUBSET RELATIONS...developed to resolve simple anaphoric reference and to correlate information from a primitive world model. Using programs for speech analysis and

  9. Limited connected speech experiment

    NASA Astrophysics Data System (ADS)

    Landell, P. B.

    1983-03-01

    The purpose of this contract was to demonstrate that connected Speech Recognition (CSR) can be performed in real-time on a vocabulary of one hundred words and to test the performance of the CSR system for twenty-five male and twenty-five female speakers. This report describes the contractor's real-time laboratory CSR system, the data base and training software developed in accordance with the contract, and the results of the performance tests.

  10. Speech Quality Measurement

    DTIC Science & Technology

    1978-05-01

    important acoustic correlate of many supersegmen- tal features , and distortions in the pitch contour are easily perceivable Ind very detrimental to quality ...not to say, however, that there is not considerable knowledge about the acoustic correlates of the features of speech. It is well established that the... vowels and in the local pitch contour 12.301. The ma-or acoustic , correlates cf syntactic structure, intonation, and emphasis are pitch, vowel

  11. Perceptual learning in speech.

    PubMed

    Norris, Dennis; McQueen, James M; Cutler, Anne

    2003-09-01

    This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listeners heard ambiguous [f]-final words (e.g., [WItlo?], from witlof, chicory) and unambiguous [s]-final words (e.g., naaldbos, pine forest). Another group heard the reverse (e.g., ambiguous [na:ldbo?], unambiguous witlof). Listeners who had heard [?] in [f]-final words were subsequently more likely to categorize ambiguous sounds on an [f]-[s] continuum as [f] than those who heard [?] in [s]-final words. Control conditions ruled out alternative explanations based on selective adaptation and contrast. Lexical information can thus be used to train categorization of speech. This use of lexical information differs from the on-line lexical feedback embodied in interactive models of speech perception. In contrast to on-line feedback, lexical feedback for learning is of benefit to spoken word recognition (e.g., in adapting to a newly encountered dialect).

  12. Speech rhythm: a metaphor?

    PubMed Central

    Nolan, Francis; Jeon, Hae-Sung

    2014-01-01

    Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep ‘prominence gradient’, i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a ‘stress-timed’ language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow ‘syntagmatic contrast’ between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence of alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and it is this analogical process which allows speech to be matched to external rhythms. PMID:25385774

  13. [Speech audiometry with logatomes].

    PubMed

    Welge-Lüssen, A; Hauser, R; Erdmann, J; Schwob, C; Probst, R

    1997-02-01

    Logatomes are nonsense syllables used for analyzing the confusion of phonemes by hearing impaired listeners. They can provide a precise differentiation of phonemic confusions which may be useful in the exact adjustment of programmable hearing aids. In this study, two lists of logatomes with 108 three-sound combinations with a structure of consonant-vowel-consonant (c-v-c) and vowel-consonant-vowel (v-c-v) were recorded on a compact disk. Twenty normally hearing adults and 28 patients with a sensorineural hearing loss were tested at a comfortable listening level of about 25 +/- 5 dB above the mean audiometric thresholds at 0,5. 1,0 and 2,0 kHz. An index of reduction of speech perception was calculated. A significant relationship between reduction of logatome perception and pure-tone audiometric thresholds at 1,2,3, and 4 kHz was demonstrated. Moreover, it was possible to distinguish between different groups of hearing impairment. The logatome test helps to analyze specific effects that hearing loss can have on the recognition of acoustic speech signals. The logatome test may become a valuable addition to speech audiometric tests with further standardization.

  14. Applications for Subvocal Speech

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Betts, Bradley

    2007-01-01

    A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

  15. Speech rhythm: a metaphor?

    PubMed

    Nolan, Francis; Jeon, Hae-Sung

    2014-12-19

    Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep 'prominence gradient', i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a 'stress-timed' language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow 'syntagmatic contrast' between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence of alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and it is this analogical process which allows speech to be matched to external rhythms.

  16. Feature extraction and models for speech: An overview

    NASA Astrophysics Data System (ADS)

    Schroeder, Manfred

    2002-11-01

    Modeling of speech has a long history, beginning with Count von Kempelens 1770 mechanical speaking machine. Even then human vowel production was seen as resulting from a source (the vocal chords) driving a physically separate resonator (the vocal tract). Homer Dudley's 1928 frequency-channel vocoder and many of its descendants are based on the same successful source-filter paradigm. For linguistic studies as well as practical applications in speech recognition, compression, and synthesis (see M. R. Schroeder, Computer Speech), the extant models require the (often difficult) extraction of numerous parameters such as the fundamental and formant frequencies and various linguistic distinctive features. Some of these difficulties were obviated by the introduction of linear predictive coding (LPC) in 1967 in which the filter part is an all-pole filter, reflecting the fact that for non-nasalized vowels the vocal tract is well approximated by an all-pole transfer function. In the now ubiquitous code-excited linear prediction (CELP), the source-part is replaced by a code book which (together with a perceptual error criterion) permits speech compression to very low bit rates at high speech quality for the Internet and cell phones.

  17. Hemispheric asymmetries in speech perception: sense, nonsense and modulations.

    PubMed

    Rosen, Stuart; Wise, Richard J S; Chadha, Shabneet; Conway, Eleanor-Jayne; Scott, Sophie K

    2011-01-01

    The well-established left hemisphere specialisation for language processing has long been claimed to be based on a low-level auditory specialization for specific acoustic features in speech, particularly regarding 'rapid temporal processing'. A novel analysis/synthesis technique was used to construct a variety of sounds based on simple sentences which could be manipulated in spectro-temporal complexity, and whether they were intelligible or not. All sounds consisted of two noise-excited spectral prominences (based on the lower two formants in the original speech) which could be static or varying in frequency and/or amplitude independently. Dynamically varying both acoustic features based on the same sentence led to intelligible speech but when either or both acoustic features were static, the stimuli were not intelligible. Using the frequency dynamics from one sentence with the amplitude dynamics of another led to unintelligible sounds of comparable spectro-temporal complexity to the intelligible ones. Positron emission tomography (PET) was used to compare which brain regions were active when participants listened to the different sounds. Neural activity to spectral and amplitude modulations sufficient to support speech intelligibility (without actually being intelligible) was seen bilaterally, with a right temporal lobe dominance. A left dominant response was seen only to intelligible sounds. It thus appears that the left hemisphere specialisation for speech is based on the linguistic properties of utterances, not on particular acoustic features.

  18. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    NASA Astrophysics Data System (ADS)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  19. Speech perception at the interface of neurobiology and linguistics.

    PubMed

    Poeppel, David; Idsardi, William J; van Wassenhove, Virginie

    2008-03-12

    Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.

  20. Speech perception in noise with a harmonic complex excited vocoder.

    PubMed

    Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Ihlefeld, Antje; Litovsky, Ruth Y

    2014-04-01

    A cochlear implant (CI) presents band-pass-filtered acoustic envelope information by modulating current pulse train levels. Similarly, a vocoder presents envelope information by modulating an acoustic carrier. By studying how normal hearing (NH) listeners are able to understand degraded speech signals with a vocoder, the parameters that best simulate electric hearing and factors that might contribute to the NH-CI performance difference may be better understood. A vocoder with harmonic complex carriers (fundamental frequency, f0 = 100 Hz) was used to study the effect of carrier phase dispersion on speech envelopes and intelligibility. The starting phases of the harmonic components were randomly dispersed to varying degrees prior to carrier filtering and modulation. NH listeners were tested on recognition of a closed set of vocoded words in background noise. Two sets of synthesis filters simulated different amounts of current spread in CIs. Results showed that the speech vocoded with carriers whose starting phases were maximally dispersed was the most intelligible. Superior speech understanding may have been a result of the flattening of the dispersed-phase carrier's intrinsic temporal envelopes produced by the large number of interacting components in the high-frequency channels. Cross-correlogram analyses of auditory nerve model simulations confirmed that randomly dispersing the carrier's component starting phases resulted in better neural envelope representation. However, neural metrics extracted from these analyses were not found to accurately predict speech recognition scores for all vocoded speech conditions. It is possible that central speech understanding mechanisms are insensitive to the envelope-fine structure dichotomy exploited by vocoders.

  1. Method and Apparatus for Segmenting a Speech Waveform.

    DTIC Science & Technology

    1995-11-07

    the computer system 40 of 17 figure 3; 18 Figures 5(a), 5(b) and 5(c) illustrate utterance rate 19 changes; 20 Figure 6 illustrates pitch waveform...24 Figure 9 shows the structural elements in the computer -4- Serial No.: Inventors: Rang and Fransen PATENT APPLICATION Navy Case No. 77,023 1...synthesis. The segmentation of a speech waveform into pitch waveforms requires two computational steps: 1) the determination of the pitch period

  2. Evaluation of NASA speech encoder

    NASA Technical Reports Server (NTRS)

    1976-01-01

    Techniques developed by NASA for spaceflight instrumentation were used in the design of a quantizer for speech-decoding. Computer simulation of the actions of the quantizer was tested with synthesized and real speech signals. Results were evaluated by a phometician. Topics discussed include the relationship between the number of quantizer levels and the required sampling rate; reconstruction of signals; digital filtering; speech recording, sampling, and storage, and processing results.

  3. A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM

    NASA Astrophysics Data System (ADS)

    Nose, Takashi; Kobayashi, Takao

    In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.

  4. Speech systems research at Texas Instruments

    NASA Technical Reports Server (NTRS)

    Doddington, George R.

    1977-01-01

    An assessment of automatic speech processing technology is presented. Fundamental problems in the development and the deployment of automatic speech processing systems are defined and a technology forecast for speech systems is presented.

  5. Speech Recognition: How Do We Teach It?

    ERIC Educational Resources Information Center

    Barksdale, Karl

    2002-01-01

    States that growing use of speech recognition software has made voice writing an essential computer skill. Describes how to present the topic, develop basic speech recognition skills, and teach speech recognition outlining, writing, proofreading, and editing. (Contains 14 references.) (SK)

  6. Speech and Language Problems in Children

    MedlinePlus

    Children vary in their development of speech and language skills. Health care professionals have lists of milestones ... it may be due to a speech or language disorder. Children who have speech disorders may have ...

  7. Enhancing Peer Feedback and Speech Preparation: The Speech Video Activity

    ERIC Educational Resources Information Center

    Opt, Susan

    2012-01-01

    In the typical public speaking course, instructors or assistants videotape or digitally record at least one of the term's speeches in class or lab to offer students additional presentation feedback. Students often watch and self-critique their speeches on their own. Peers often give only written feedback on classroom presentations or completed…

  8. Recognizing articulatory gestures from speech for robust speech recognition.

    PubMed

    Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol; Saltzman, Elliot; Goldstein, Louis

    2012-03-01

    Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.

  9. IBM MASTOR SYSTEM: Multilingual Automatic Speech-to-speech Translator

    DTIC Science & Technology

    2006-01-01

    IBM MASTOR SYSTEM: Multilingual Automatic Speech-to-speech Translator * Yuqing Gao, Liang Gu, Bowen Zhou, Ruhi Sarikaya, Mohamed Afify , Hong-Kwang...for Improved Discriminative Training,” In Proc. ICASSP, Orlando, 2002. [14] M. Afify et.al, “On the Use of Morphological Analysis for Dialectal

  10. Concatenated logic circuits based on a three-way DNA junction: a keypad-lock security system with visible readout and an automatic reset function.

    PubMed

    Chen, Junhua; Zhou, Shungui; Wen, Junlin

    2015-01-07

    Concatenated logic circuits operating as a biocomputing keypad-lock security system with an automatic reset function have been successfully constructed on the basis of toehold-mediated strand displacement and three-way-DNA-junction architecture. In comparison with previously reported keypad locks, the distinctive advantage of the proposed security system is that it can be reset and cycled spontaneously a large number of times without an external stimulus, thus making practical applications possible. By the use of a split-G-quadruplex DNAzyme as the signal reporter, the output of the keypad lock can be recognized readily by the naked eye. The "lock" is opened only when the inputs are introduced in an exact order. This requirement provides defense against illegal invasion to protect information at the molecular scale. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Statistical assessment of speech system performance

    NASA Technical Reports Server (NTRS)

    Moshier, Stephen L.

    1977-01-01

    Methods for the normalization of performance tests results of speech recognition systems are presented. Technological accomplishments in speech recognition systems, as well as planned research activities are described.

  12. Phylogeny of the cycads based on multiple single-copy nuclear genes: congruence of concatenated parsimony, likelihood and species tree inference methods.

    PubMed

    Salas-Leiva, Dayana E; Meerow, Alan W; Calonje, Michael; Griffith, M Patrick; Francisco-Ortega, Javier; Nakamura, Kyoko; Stevenson, Dennis W; Lewis, Carl E; Namoff, Sandra

    2013-11-01

    Despite a recent new classification, a stable phylogeny for the cycads has been elusive, particularly regarding resolution of Bowenia, Stangeria and Dioon. In this study, five single-copy nuclear genes (SCNGs) are applied to the phylogeny of the order Cycadales. The specific aim is to evaluate several gene tree-species tree reconciliation approaches for developing an accurate phylogeny of the order, to contrast them with concatenated parsimony analysis and to resolve the erstwhile problematic phylogenetic position of these three genera. DNA sequences of five SCNGs were obtained for 20 cycad species representing all ten genera of Cycadales. These were analysed with parsimony, maximum likelihood (ML) and three Bayesian methods of gene tree-species tree reconciliation, using Cycas as the outgroup. A calibrated date estimation was developed with Bayesian methods, and biogeographic analysis was also conducted. Concatenated parsimony, ML and three species tree inference methods resolve exactly the same tree topology with high support at most nodes. Dioon and Bowenia are the first and second branches of Cycadales after Cycas, respectively, followed by an encephalartoid clade (Macrozamia-Lepidozamia-Encephalartos), which is sister to a zamioid clade, of which Ceratozamia is the first branch, and in which Stangeria is sister to Microcycas and Zamia. A single, well-supported phylogenetic hypothesis of the generic relationships of the Cycadales is presented. However, massive extinction events inferred from the fossil record that eliminated broader ancestral distributions within Zamiaceae compromise accurate optimization of ancestral biogeographical areas for that hypothesis. While major lineages of Cycadales are ancient, crown ages of all modern genera are no older than 12 million years, supporting a recent hypothesis of mostly Miocene radiations. This phylogeny can contribute to an accurate infrafamilial classification of Zamiaceae.

  13. Phylogeny of the cycads based on multiple single-copy nuclear genes: congruence of concatenated parsimony, likelihood and species tree inference methods

    PubMed Central

    Salas-Leiva, Dayana E.; Meerow, Alan W.; Calonje, Michael; Griffith, M. Patrick; Francisco-Ortega, Javier; Nakamura, Kyoko; Stevenson, Dennis W.; Lewis, Carl E.; Namoff, Sandra

    2013-01-01

    Background and aims Despite a recent new classification, a stable phylogeny for the cycads has been elusive, particularly regarding resolution of Bowenia, Stangeria and Dioon. In this study, five single-copy nuclear genes (SCNGs) are applied to the phylogeny of the order Cycadales. The specific aim is to evaluate several gene tree–species tree reconciliation approaches for developing an accurate phylogeny of the order, to contrast them with concatenated parsimony analysis and to resolve the erstwhile problematic phylogenetic position of these three genera. Methods DNA sequences of five SCNGs were obtained for 20 cycad species representing all ten genera of Cycadales. These were analysed with parsimony, maximum likelihood (ML) and three Bayesian methods of gene tree–species tree reconciliation, using Cycas as the outgroup. A calibrated date estimation was developed with Bayesian methods, and biogeographic analysis was also conducted. Key Results Concatenated parsimony, ML and three species tree inference methods resolve exactly the same tree topology with high support at most nodes. Dioon and Bowenia are the first and second branches of Cycadales after Cycas, respectively, followed by an encephalartoid clade (Macrozamia–Lepidozamia–Encephalartos), which is sister to a zamioid clade, of which Ceratozamia is the first branch, and in which Stangeria is sister to Microcycas and Zamia. Conclusions A single, well-supported phylogenetic hypothesis of the generic relationships of the Cycadales is presented. However, massive extinction events inferred from the fossil record that eliminated broader ancestral distributions within Zamiaceae compromise accurate optimization of ancestral biogeographical areas for that hypothesis. While major lineages of Cycadales are ancient, crown ages of all modern genera are no older than 12 million years, supporting a recent hypothesis of mostly Miocene radiations. This phylogeny can contribute to an accurate infrafamilial

  14. Multimodal Discrimination of Schizophrenia Using Hybrid Weighted Feature Concatenation of Brain Functional Connectivity and Anatomical Features with an Extreme Learning Machine.

    PubMed

    Qureshi, Muhammad Naveed Iqbal; Oh, Jooyoung; Cho, Dongrae; Jo, Hang Joon; Lee, Boreom

    2017-01-01

    Multimodal features of structural and functional magnetic resonance imaging (MRI) of the human brain can assist in the diagnosis of schizophrenia. We performed a classification study on age, sex, and handedness-matched subjects. The dataset we used is publicly available from the Center for Biomedical Research Excellence (COBRE) and it consists of two groups: patients with schizophrenia and healthy controls. We performed an independent component analysis and calculated global averaged functional connectivity-based features from the resting-state functional MRI data for all the cortical and subcortical anatomical parcellation. Cortical thickness along with standard deviation, surface area, volume, curvature, white matter volume, and intensity measures from the cortical parcellation, as well as volume and intensity from sub-cortical parcellation and overall volume of cortex features were extracted from the structural MRI data. A novel hybrid weighted feature concatenation method was used to acquire maximal 99.29% (P < 0.0001) accuracy which preserves high discriminatory power through the weight of the individual feature type. The classification was performed by an extreme learning machine, and its efficiency was compared to linear and non-linear (radial basis function) support vector machines, linear discriminant analysis, and random forest bagged tree ensemble algorithms. This article reports the predictive accuracy of both unimodal and multimodal features after 10-by-10-fold nested cross-validation. A permutation test followed the classification experiment to assess the statistical significance of the classification results. It was concluded that, from a clinical perspective, this feature concatenation approach may assist the clinicians in schizophrenia diagnosis.

  15. Wideband speech enhancement addition

    NASA Astrophysics Data System (ADS)

    Weiss, M. R.; Aschkenasy, E.

    1981-05-01

    This report describes the completion of the development and construction of a speech enhancement unit (SEU). This is an electronic instrument that automatically detects and attenuates tones, clicks, and wideband random noises that may accompany speech signals that are transmitted, recorded, or reproduced electronically. An earlier version of this device was tested extensively by the Air Force and then was returned to Queens College for modification and completion of the system. During the period that is covered by this report, a number of major changes were made in the SEU, leading to a device that is simpler to use, more effective, and more broadly useful in its intended area or application. Some of the changes that were made in the SEU were aimed at reducing the degree of operator intervention that was required. To this end, the SEU was greatly simplified and made more automatic. The manual Digital Spectrum Shaping (DSS) system and most of the manual controls were removed. A new system was added for adjusting the level of the input signal. It keeps the signal at the level that maximizes the effectiveness of the noise attenuation processes. The INTEL process for attenuating wideband random noise was incorporated into the SEU. To make it possible for the speech enhancement unit to operate in real-time with all processes active, the hardware and software of the system were modified extensively. The MAP was upgraded by adding a second arithmetic processor and a high speed memory. The existing programs and algorithms were rewritten to reduce their execution times. These and the INTEL programs were modified to fully exploit the capabilities of the upgraded MAP.

  16. Flat-spectrum speech.

    PubMed

    Schroeder, M R; Strube, H W

    1986-05-01

    Flat-spectrum stimuli, consisting of many equal-amplitude harmonics, produce timbre sensations that can depend strongly on the phase angles of the individual harmonics. For fundamental frequencies in the human pitch range, many realizable timbres have vowel-like perceptual qualities. This observation suggests the possibility of constructing intelligible voiced speech signals that have flat-amplitude spectra. This paper describes a successful experiment of creating several different diphthongs by judicious choice of the phase angles of a flat-spectrum waveform. A possible explanation of the observed vowel timbres lies in the dependence of the short-time amplitude spectra on phase changes.

  17. Speech Understanding Systems

    DTIC Science & Technology

    1976-02-01

    kHz that is a fixed number of decibels below the maximum value in the spectrum. A value of zero, however, is not recommended. (c) Speech for the...probability distributions for [t,p,k,d,n] should be evaluated using the observed parameters. But the scores on each of the vowels are all bad, so...plosives [p,t,k] is to examine the burst frequency and the voice-onset-time (VOT) when the plosive is followed by a vowel or semi- vowel . However, if

  18. Automatic Speech Recognition

    NASA Astrophysics Data System (ADS)

    Potamianos, Gerasimos; Lamel, Lori; Wölfel, Matthias; Huang, Jing; Marcheret, Etienne; Barras, Claude; Zhu, Xuan; McDonough, John; Hernando, Javier; Macho, Dusan; Nadeu, Climent

    Automatic speech recognition (ASR) is a critical component for CHIL services. For example, it provides the input to higher-level technologies, such as summarization and question answering, as discussed in Chapter 8. In the spirit of ubiquitous computing, the goal of ASR in CHIL is to achieve a high performance using far-field sensors (networks of microphone arrays and distributed far-field microphones). However, close-talking microphones are also of interest, as they are used to benchmark ASR system development by providing a best-case acoustic channel scenario to compare against.

  19. Speech Communication: A Radical Doctrine?

    ERIC Educational Resources Information Center

    Haiman, Franklyn S.

    1983-01-01

    Reviews connections between speech communication as a discipline and active commitments to First Amendment principles. Reflects on the influence of Professor James O'Neil, principal founder of the Speech Communication Association and offers his example as a role model. (PD)

  20. Disciplining Students for "Indecent" Speech.

    ERIC Educational Resources Information Center

    Cromartie, Martha

    1987-01-01

    Reviews the implications of the Supreme Court decision in "Bethel School District No. 403 v. Fraser." Schools must still comply with First Amendment and may not restrict student speech without valid reason. The maintenance of school order and the protection of rights of others are valid reasons. Speech is not immunized by the…

  1. Speech Prosody in Cerebellar Ataxia

    ERIC Educational Resources Information Center

    Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.

    2007-01-01

    Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…

  2. Audiovisual Speech Recalibration in Children

    ERIC Educational Resources Information Center

    van Linden, Sabine; Vroomen, Jean

    2008-01-01

    In order to examine whether children adjust their phonetic speech categories, children of two age groups, five-year-olds and eight-year-olds, were exposed to a video of a face saying /aba/ or /ada/ accompanied by an auditory ambiguous speech sound halfway between /b/ and /d/. The effect of exposure to these audiovisual stimuli was measured on…

  3. Interpersonal Orientation and Speech Behavior.

    ERIC Educational Resources Information Center

    Street, Richard L., Jr.; Murphy, Thomas L.

    1987-01-01

    Indicates that (1) males with low interpersonal orientation (IO) were least vocally active and expressive and least consistent in their speech performances, and (2) high IO males and low IO females tended to demonstrate greater speech convergence than either low IO males or high IO females. (JD)

  4. Speech Prosody in Cerebellar Ataxia

    ERIC Educational Resources Information Center

    Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.

    2007-01-01

    Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…

  5. SPEECH--MAN'S NATURAL COMMUNICATION.

    ERIC Educational Resources Information Center

    DUDLEY, HOMER; AND OTHERS

    SESSION 63 OF THE 1967 INSTITUTE OF ELECTRICAL AND ELECTRONIC ENGINEERS INTERNATIONAL CONVENTION BROUGHT TOGETHER SEVEN DISTINGUISHED MEN WORKING IN FIELDS RELEVANT TO LANGUAGE. THEIR TOPICS INCLUDED ORIGIN AND EVOLUTION OF SPEECH AND LANGUAGE, LANGUAGE AND CULTURE, MAN'S PHYSIOLOGICAL MECHANISMS FOR SPEECH, LINGUISTICS, AND TECHNOLOGY AND…

  6. Creating speech-synchronized animation.

    PubMed

    King, Scott A; Parent, Richard E

    2005-01-01

    We present a facial model designed primarily to support animated speech. Our facial model takes facial geometry as input and transforms it into a parametric deformable model. The facial model uses a muscle-based parameterization, allowing for easier integration between speech synchrony and facial expressions. Our facial model has a highly deformable lip model that is grafted onto the input facial geometry to provide the necessary geometric complexity needed for creating lip shapes and high-quality renderings. Our facial model also includes a highly deformable tongue model that can represent the shapes the tongue undergoes during speech. We add teeth, gums, and upper palate geometry to complete the inner mouth. To decrease the processing time, we hierarchically deform the facial surface. We also present a method to animate the facial model over time to create animated speech using a model of coarticulation that blends visemes together using dominance functions. We treat visemes as a dynamic shaping of the vocal tract by describing visemes as curves instead of keyframes. We show the utility of the techniques described in this paper by implementing them in a text-to-audiovisual-speech system that creates animation of speech from unrestricted text. The facial and coarticulation models must first be interactively initialized. The system then automatically creates accurate real-time animated speech from the input text. It is capable of cheaply producing tremendous amounts of animated speech with very low resource requirements.

  7. SPEECH--MAN'S NATURAL COMMUNICATION.

    ERIC Educational Resources Information Center

    DUDLEY, HOMER; AND OTHERS

    SESSION 63 OF THE 1967 INSTITUTE OF ELECTRICAL AND ELECTRONIC ENGINEERS INTERNATIONAL CONVENTION BROUGHT TOGETHER SEVEN DISTINGUISHED MEN WORKING IN FIELDS RELEVANT TO LANGUAGE. THEIR TOPICS INCLUDED ORIGIN AND EVOLUTION OF SPEECH AND LANGUAGE, LANGUAGE AND CULTURE, MAN'S PHYSIOLOGICAL MECHANISMS FOR SPEECH, LINGUISTICS, AND TECHNOLOGY AND…

  8. Speech acoustics: How much science?

    PubMed

    Tiwari, Manjul

    2012-01-01

    Human vocalizations are sounds made exclusively by a human vocal tract. Among other vocalizations, for example, laughs or screams, speech is the most important. Speech is the primary medium of that supremely human symbolic communication system called language. One of the functions of a voice, perhaps the main one, is to realize language, by conveying some of the speaker's thoughts in linguistic form. Speech is language made audible. Moreover, when phoneticians compare and describe voices, they usually do so with respect to linguistic units, especially speech sounds, like vowels or consonants. It is therefore necessary to understand the structure as well as nature of speech sounds and how they are described. In order to understand and evaluate the speech, it is important to have at least a basic understanding of science of speech acoustics: how the acoustics of speech are produced, how they are described, and how differences, both between speakers and within speakers, arise in an acoustic output. One of the aims of this article is try to facilitate this understanding.

  9. Speech Analysis Systems: An Evaluation.

    ERIC Educational Resources Information Center

    Read, Charles; And Others

    1992-01-01

    Performance characteristics are reviewed for seven computerized systems marketed for acoustic speech analysis: CSpeech, CSRE, ILS-PC, Kay Elemetrics model 550 Sona-Graph, MacSpeech Lab II, MSL, and Signalyze. Characteristics reviewed include system components, basic capabilities, documentation, user interface, data formats and journaling, and…

  10. Theoretical Aspects of Speech Production.

    ERIC Educational Resources Information Center

    Stevens, Kenneth N.

    1992-01-01

    This paper on speech production in children and youth with hearing impairments summarizes theoretical aspects, including the speech production process, sound sources in the vocal tract, vowel production, and consonant production. Examples of spectra for several classes of vowel and consonant sounds in simple syllables are given. (DB)

  11. Speech outcomes in Cantonese patients after glossectomy.

    PubMed

    Wong, Ripley Kit; Poon, Esther Sok-Man; Woo, Cynthia Yuen-Man; Chan, Sabina Ching-Shun; Wong, Elsa Siu-Ping; Chu, Ada Wai-Sze

    2007-08-01

    We sought to determine the major factors affecting speech production of Cantonese-speaking glossectomized patients. Error pattern was analyzed. Forty-one Cantonese-speaking subjects who had undergone glossectomy > or = 6 months previously were recruited. Speech production evaluation included (1) phonetic error analysis in nonsense syllable; (2) speech intelligibility in sentences evaluated by naive listeners; (3) overall speech intelligibility in conversation evaluated by experienced speech therapists. Patients receiving adjuvant radiotherapy had significantly poorer segmental and connected speech production. Total or subtotal glossectomy also resulted in poor speech outcomes. Patients having free flap reconstruction showed the best speech outcomes. Patients without lymph node metastasis had significantly better speech scores when compared with patients with lymph node metastasis. Initial consonant production had the worst scores, while vowel production was the least affected. Speech outcomes of Cantonese-speaking glossectomized patients depended on the severity of the disease. Initial consonants had the greatest effect on speech intelligibility.

  12. Optimizing cochlear implant speech performance.

    PubMed

    Skinner, Margaret W

    2003-09-01

    Results of studies performed in our laboratory suggest that cochlear implant recipients understand speech best if the following speech processor parameters are individually chosen for each person: minimum and maximum stimulation levels on each electrode in the speech processor program (MAP), stimulation rate, and speech coding strategy. If these and related parameters are chosen to make soft sounds (from approximately 100 to 6,000 Hz) audible at as close to 20 dB hearing level as possible and loud sounds not too loud, recipients have the opportunity to hear speech in everyday life situations that are of key importance to children who are learning language and to all recipients in terms of ease of communication.

  13. Dichotic speech tests.

    PubMed

    Hällgren, M; Johansson, M; Larsby, B; Arlinger, S

    1998-01-01

    When central auditory dysfunction is present, ability to understand speech in difficult listening situations can be affected. To study this phenomenon, dichotic speech tests were performed with test material in the Swedish language. Digits, spondees, sentences and consonant-vowel syllables were used as stimuli and the reporting was free or directed. The test material was recorded on CD. The study includes a normal group of 30 people in three different age categories; 11 years, 23-27 years and 67-70 years. It also includes two different groups of subjects with suspected central auditory lesions; 11 children with reading and writing difficulties and 4 adults earlier exposed to organic solvents. The results from the normal group do not show any differences in performance due to age. The children with reading and writing difficulties show a significant deviation for one test with digits and one test with syllables. Three of the four adults exposed to solvents show a significant deviation from the normal group.

  14. Monaural speech segregation

    NASA Astrophysics Data System (ADS)

    Wang, Deliang; Hu, Guoning

    2003-04-01

    Speech segregation from a monaural recording is a primary task of auditory scene analysis, and has proven to be very challenging. We present a multistage model for the task. The model starts with simulated auditory periphery. A subsequent stage computes midlevel auditory representations, including correlograms and cross-channel correlations. The core of the system performs segmentation and grouping in a two-dimensional time-frequency representation that encodes proximity in frequency and time, periodicity, and amplitude modulation (AM). Motivated by psychoacoustic observations, our system employs different mechanisms for handling resolved and unresolved harmonics. For resolved harmonics, our system generates segments-basic components of an auditory scene-based on temporal continuity and cross-channel correlation, and groups them according to periodicity. For unresolved harmonics, the system generates segments based on AM in addition to temporal continuity and groups them according to AM repetition rates. We derive AM repetition rates using sinusoidal modeling and gradient descent. Underlying the segregation process is a pitch contour that is first estimated from speech segregated according to global pitch and then adjusted according to psychoacoustic constraints. The model has been systematically evaluated, and it yields substantially better performance than previous systems.

  15. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    PubMed

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues.

  16. Hate Speech or Free Speech: Can Broad Campus Speech Regulations Survive Current Judicial Reasoning?

    ERIC Educational Resources Information Center

    Heiser, Gregory M.; Rossow, Lawrence F.

    1993-01-01

    Federal courts have found speech regulations overbroad in suits against the University of Michigan and the University of Wisconsin System. Attempts to assess the theoretical justification and probable fate of broad speech regulations that have not been explicitly rejected by the courts. Concludes that strong arguments for broader regulation will…

  17. Suppressing aliasing noise in the speech feature domain for automatic speech recognition.

    PubMed

    Deng, Huiqun; O'Shaughnessy, Douglas

    2008-07-01

    This letter points out that, although in the audio signal domain low-pass filtering has been used to prevent aliasing noise from entering the baseband of speech signals, an antialias process in the speech feature domain is still needed to prevent high modulation frequency components from entering the baseband of speech features. The existence of aliasing noise in speech features is revealed via spectral analysis of speech feature streams. A method for suppressing such aliasing noise is proposed. Experiments on large vocabulary speech recognition show that antialias processing of speech features can improve speech recognition, especially for noisy speech.

  18. Speech Anxiety: The Importance of Identification in the Basic Speech Course.

    ERIC Educational Resources Information Center

    Mandeville, Mary Y.

    A study investigated speech anxiety in the basic speech course by means of pre and post essays. Subjects, 73 students in 3 classes in the basic speech course at a southwestern multiuniversity, wrote a two-page essay on their perceptions of their speech anxiety before the first speaking project. Students discussed speech anxiety in class and were…

  19. Hate Speech/Free Speech: Using Feminist Perspectives To Foster On-Campus Dialogue.

    ERIC Educational Resources Information Center

    Cornwell, Nancy; Orbe, Mark P.; Warren, Kiesha

    1999-01-01

    Explores the complex issues inherent in the tension between hate speech and free speech, focusing on the phenomenon of hate speech on college campuses. Describes the challenges to hate speech made by critical race theorists and explains how a feminist critique can reorient the parameters of hate speech. (SLD)

  20. Speech Patterns and Racial Wage Inequality

    ERIC Educational Resources Information Center

    Grogger, Jeffrey

    2011-01-01

    Speech patterns differ substantially between whites and many African Americans. I collect and analyze speech data to understand the role that speech may play in explaining racial wage differences. Among blacks, speech patterns are highly correlated with measures of skill such as schooling and AFQT scores. They are also highly correlated with the…

  1. Freedom of Speech Newsletter, February 1976.

    ERIC Educational Resources Information Center

    Allen, Winfred G., Jr., Ed.

    The "Freedom of Speech Newsletter" is the communication medium, published four times each academic year, of the Freedom of Speech Interest Group, Western Speech Communication Association. Articles included in this issue are "What Is Academic Freedom For?" by Ralph Ross, "A Sociology of Free Speech" by Ray Heidt,…

  2. The "Checkers" Speech and Televised Political Communication.

    ERIC Educational Resources Information Center

    Flaningam, Carl

    Richard Nixon's 1952 "Checkers" speech was an innovative use of television for political communication. Like television news itself, the campaign fund crisis behind the speech can be thought of in the same terms as other television melodrama, with the speech serving as its climactic episode. The speech adapted well to television because…

  3. The "Checkers" Speech and Televised Political Communication.

    ERIC Educational Resources Information Center

    Flaningam, Carl

    Richard Nixon's 1952 "Checkers" speech was an innovative use of television for political communication. Like television news itself, the campaign fund crisis behind the speech can be thought of in the same terms as other television melodrama, with the speech serving as its climactic episode. The speech adapted well to television because…

  4. Preschool Children's Awareness of Private Speech

    ERIC Educational Resources Information Center

    Manfra, Louis; Winsler, Adam

    2006-01-01

    The present study explored: (a) preschool children's awareness of their own talking and private speech (speech directed to the self); (b) differences in age, speech use, language ability, and mentalizing abilities between children with awareness and those without; and (c) children's beliefs and attitudes about private speech. Fifty-one children…

  5. Connected Speech Processes in Australian English.

    ERIC Educational Resources Information Center

    Ingram, J. C. L.

    1989-01-01

    Explores the role of Connected Speech Processes (CSP) in accounting for sociolinguistically significant dimensions of speech variation, and presents initial findings on the distribution of CSPs in the speech of Australian adolescents. The data were gathered as part of a wider survey of speech of Brisbane school children. (Contains 26 references.)…

  6. Instructional Improvement Speech Handbook. Secondary Level.

    ERIC Educational Resources Information Center

    Crapse, Larry

    Recognizing that speech is an important component of the language arts and that the English curriculum is the most natural place for speech skills to be fostered, this handbook examines several methods of developing speech competencies within the secondary school English classroom. The first section, "Looking at Speech," examines the…

  7. Audio-Visual Speech Perception Is Special

    ERIC Educational Resources Information Center

    Tuomainen, J.; Andersen, T.S.; Tiippana, K.; Sams, M.

    2005-01-01

    In face-to-face conversation speech is perceived by ear and eye. We studied the prerequisites of audio-visual speech perception by using perceptually ambiguous sine wave replicas of natural speech as auditory stimuli. When the subjects were not aware that the auditory stimuli were speech, they showed only negligible integration of auditory and…

  8. Analysis of False Starts in Spontaneous Speech.

    ERIC Educational Resources Information Center

    O'Shaughnessy, Douglas

    A primary difference between spontaneous speech and read speech concerns the use of false starts, where a speaker interrupts the flow of speech to restart his or her utterance. A study examined the acoustic aspects of such restarts in a widely-used speech database, examining approximately 1000 utterances, about 10% of which contained a restart.…

  9. Infant Perception of Atypical Speech Signals

    ERIC Educational Resources Information Center

    Vouloumanos, Athena; Gelfand, Hanna M.

    2013-01-01

    The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…

  10. Multifractal nature of unvoiced speech signals

    SciTech Connect

    Adeyemi, O.A.; Hartt, K.; Boudreaux-Bartels, G.F.

    1996-06-01

    A refinement is made in the nonlinear dynamic modeling of speech signals. Previous research successfully characterized speech signals as chaotic. Here, we analyze fricative speech signals using multifractal measures to determine various fractal regimes present in their chaotic attractors. Results support the hypothesis that speech signals have multifractal measures. {copyright} {ital 1996 American Institute of Physics.}

  11. Automated Speech Rate Measurement in Dysarthria

    ERIC Educational Resources Information Center

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  12. Emerging Technologies Speech Tools and Technologies

    ERIC Educational Resources Information Center

    Godwin-Jones, Robert

    2009-01-01

    Using computers to recognize and analyze human speech goes back at least to the 1970's. Developed initially to help the hearing or speech impaired, speech recognition was also used early on experimentally in language learning. Since the 1990's, advances in the scientific understanding of speech as well as significant enhancements in software and…

  13. Automated Speech Rate Measurement in Dysarthria

    ERIC Educational Resources Information Center

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  14. Preschool Children's Awareness of Private Speech

    ERIC Educational Resources Information Center

    Manfra, Louis; Winsler, Adam

    2006-01-01

    The present study explored: (a) preschool children's awareness of their own talking and private speech (speech directed to the self); (b) differences in age, speech use, language ability, and mentalizing abilities between children with awareness and those without; and (c) children's beliefs and attitudes about private speech. Fifty-one children…

  15. Infant Perception of Atypical Speech Signals

    ERIC Educational Resources Information Center

    Vouloumanos, Athena; Gelfand, Hanna M.

    2013-01-01

    The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…

  16. Infant-Directed Speech Facilitates Word Segmentation

    ERIC Educational Resources Information Center

    Thiessen, Erik D.; Hill, Emily A.; Saffran, Jenny R.

    2005-01-01

    There are reasons to believe that infant-directed (ID) speech may make language acquisition easier for infants. However, the effects of ID speech on infants' learning remain poorly understood. The experiments reported here assess whether ID speech facilitates word segmentation from fluent speech. One group of infants heard a set of nonsense…

  17. Freedom of Speech Newsletter, February 1976.

    ERIC Educational Resources Information Center

    Allen, Winfred G., Jr., Ed.

    The "Freedom of Speech Newsletter" is the communication medium, published four times each academic year, of the Freedom of Speech Interest Group, Western Speech Communication Association. Articles included in this issue are "What Is Academic Freedom For?" by Ralph Ross, "A Sociology of Free Speech" by Ray Heidt,…

  18. ON THE NATURE OF SPEECH SCIENCE.

    ERIC Educational Resources Information Center

    PETERSON, GORDON E.

    IN THIS ARTICLE THE NATURE OF THE DISCIPLINE OF SPEECH SCIENCE IS CONSIDERED AND THE VARIOUS BASIC AND APPLIED AREAS OF THE DISCIPLINE ARE DISCUSSED. THE BASIC AREAS ENCOMPASS THE VARIOUS PROCESSES OF THE PHYSIOLOGY OF SPEECH PRODUCTION, THE ACOUSTICAL CHARACTERISTICS OF SPEECH, INCLUDING THE SPEECH WAVE TYPES AND THE INFORMATION-BEARING ACOUSTIC…

  19. Phonetic Recalibration Only Occurs in Speech Mode

    ERIC Educational Resources Information Center

    Vroomen, Jean; Baart, Martijn

    2009-01-01

    Upon hearing an ambiguous speech sound dubbed onto lipread speech, listeners adjust their phonetic categories in accordance with the lipread information (recalibration) that tells what the phoneme should be. Here we used sine wave speech (SWS) to show that this tuning effect occurs if the SWS sounds are perceived as speech, but not if the sounds…

  20. Is Birdsong More Like Speech or Music?

    PubMed

    Shannon, Robert V

    2016-04-01

    Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

    PubMed Central

    Lee, Byeongwook; Cho, Kwang-Hyun

    2016-01-01

    Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test. PMID:27876875

  2. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

    NASA Astrophysics Data System (ADS)

    Lee, Byeongwook; Cho, Kwang-Hyun

    2016-11-01

    Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.

  3. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference.

    PubMed

    Lee, Byeongwook; Cho, Kwang-Hyun

    2016-11-23

    Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.

  4. Mapping acoustics to kinematics in speech

    NASA Astrophysics Data System (ADS)

    Bali, Rohan

    An accurate mapping from speech acoustics to speech articulator movements has many practical applications, as well as theoretical implications of speech planning and perception science. This work can be divided into two parts. In the first part, we show that a simple codebook can be used to map acoustics to speech articulator movements in natural, conversational speech. In the second part, we incorporate cost optimization principles that have been shown to be relevant in motor control tasks into the codebook approach. These cost optimizations are defined as minimization of integral of magnitude velocity, acceleration and jerk of the speech articulators, and are implemented using a dynamic programming technique. Results show that incorporating cost minimization of speech articulator movements can significantly improve mapping acoustics to speech articulator movements. This suggests underlying physiological or neural planning principles used by speech articulators during speech production.

  5. Pronunciation models for conversational speech

    NASA Astrophysics Data System (ADS)

    Johnson, Keith

    2005-09-01

    Using a pronunciation dictionary of clear speech citation forms a segment deletion rate of nearly 12% is found in a corpus of conversational speech. The number of apparent segment deletions can be reduced by constructing a pronunciation dictionary that records one or more of the actual pronunciations found in conversational speech; however, the resulting empirical pronunciation dictionary often fails to include the citation pronunciation form. Issues involved in selecting pronunciations for a dictionary for linguistic, psycholinguistic, and ASR research will be discussed. One conclusion is that Ladefoged may have been the wiser for avoiding the business of producing pronunciation dictionaries. [Supported by NIDCD Grant No. R01 DC04330-03.

  6. Visual speech influences speech perception immediately but not automatically.

    PubMed

    Mitterer, Holger; Reinisch, Eva

    2017-02-01

    Two experiments examined the time course of the use of auditory and visual speech cues to spoken word recognition using an eye-tracking paradigm. Results of the first experiment showed that the use of visual speech cues from lipreading is reduced if concurrently presented pictures require a division of attentional resources. This reduction was evident even when listeners' eye gaze was on the speaker rather than the (static) pictures. Experiment 2 used a deictic hand gesture to foster attention to the speaker. At the same time, the visual processing load was reduced by keeping the visual display constant over a fixed number of successive trials. Under these conditions, the visual speech cues from lipreading were used. Moreover, the eye-tracking data indicated that visual information was used immediately and even earlier than auditory information. In combination, these data indicate that visual speech cues are not used automatically, but if they are used, they are used immediately.

  7. Applications of Hilbert Spectral Analysis for Speech and Sound Signals

    NASA Technical Reports Server (NTRS)

    Huang, Norden E.

    2003-01-01

    A new method for analyzing nonlinear and nonstationary data has been developed, and the natural applications are to speech and sound signals. The key part of the method is the Empirical Mode Decomposition method with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF is defined as any function having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The IMF also admits well-behaved Hilbert transform. This decomposition method is adaptive, and, therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes. With the Hilbert transform, the Intrinsic Mode Functions yield instantaneous frequencies as functions of time, which give sharp identifications of imbedded structures. This method invention can be used to process all acoustic signals. Specifically, it can process the speech signals for Speech synthesis, Speaker identification and verification, Speech recognition, and Sound signal enhancement and filtering. Additionally, as the acoustical signals from machinery are essentially the way the machines are talking to us. Therefore, the acoustical signals, from the machines, either from sound through air or vibration on the machines, can tell us the operating conditions of the machines. Thus, we can use the acoustic signal to diagnosis the problems of machines.

  8. Applications of Hilbert Spectral Analysis for Speech and Sound Signals

    NASA Technical Reports Server (NTRS)

    Huang, Norden E.

    2003-01-01

    A new method for analyzing nonlinear and nonstationary data has been developed, and the natural applications are to speech and sound signals. The key part of the method is the Empirical Mode Decomposition method with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF is defined as any function having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The IMF also admits well-behaved Hilbert transform. This decomposition method is adaptive, and, therefore, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes. With the Hilbert transform, the Intrinsic Mode Functions yield instantaneous frequencies as functions of time, which give sharp identifications of imbedded structures. This method invention can be used to process all acoustic signals. Specifically, it can process the speech signals for Speech synthesis, Speaker identification and verification, Speech recognition, and Sound signal enhancement and filtering. Additionally, as the acoustical signals from machinery are essentially the way the machines are talking to us. Therefore, the acoustical signals, from the machines, either from sound through air or vibration on the machines, can tell us the operating conditions of the machines. Thus, we can use the acoustic signal to diagnosis the problems of machines.

  9. Speech recovery device

    SciTech Connect

    Frankle, Christen M.

    2000-10-19

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  10. Steganalysis of recorded speech

    NASA Astrophysics Data System (ADS)

    Johnson, Micah K.; Lyu, Siwei; Farid, Hany

    2005-03-01

    Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.

  11. Speech recovery device

    SciTech Connect

    Frankle, Christen M.

    2004-04-20

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  12. Silog: Speech Input Logon

    NASA Astrophysics Data System (ADS)

    Grau, Sergio; Allen, Tony; Sherkat, Nasser

    Silog is a biometrie authentication system that extends the conventional PC logon process using voice verification. Users enter their ID and password using a conventional Windows logon procedure but then the biometrie authentication stage makes a Voice over IP (VoIP) call to a VoiceXML (VXML) server. User interaction with this speech-enabled component then allows the user's voice characteristics to be extracted as part of a simple user/system spoken dialogue. If the captured voice characteristics match those of a previously registered voice profile, then network access is granted. If no match is possible, then a potential unauthorised system access has been detected and the logon process is aborted.

  13. Anxiety and ritualized speech.

    PubMed

    Lalljee, M; Cook, M

    1975-08-01

    The experiment examines the effects on a number of words that seem irrevelant to semantic communication. The Units of Ritualized Speech (URSs) considered are: 'I mean', 'in fact', 'really', 'sort of', 'well', and 'you know'. Two hypotheses are tested: (i) that URS rate will increase with anxiety; and (ii) that the speaker's preferred URS will increase with anxiety. Subjects were interviewed on topics they had previously rated as anxiety-provoking and non-anxiety-provoking. Hypothesis (i) was supported, but hypothesis (ii) was not. More specifically, the use of 'I mean' and 'well' increases when the speaker is anxious. Explanation for this is sought in the grammatical location of these two units. Sex differences in the use of URSs, correlations between URSs and their relationship to other forms of disfluency are also considered.

  14. Deep Ensemble Learning for Monaural Speech Separation

    DTIC Science & Technology

    2015-02-01

    Ensemble Learning for Monaural Speech Separation Xiao-Lei Zhang Department of Computer Science and Engineering The Ohio State University, Columbus...State University, Columbus, OH 43210, USA dwang@cse.ohio-state.edu Abstract – Monaural speech separation is a fundamental problem in robust speech...processing. Recently, deep neural network (DNN) based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have

  15. Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech

    NASA Astrophysics Data System (ADS)

    Přibil, J.; Přibilová, A.

    2009-01-01

    The paper addresses reflection of microintonation and spectral properties in male and female acted emotional speech. Microintonation component of speech melody is analyzed regarding its spectral and statistical parameters. According to psychological research of emotional speech, different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of cepstral coefficient values and ranges of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Histograms of cepstral coefficient distribution are evaluated and compared using skewness and kurtosis. Achieved statistical results show good correlation comparing male and female voices for all emotional states portrayed by several Czech and Slovak professional actors.

  16. Analysis of the glottal excitation of emotionally styled and stressed speech.

    PubMed

    Cummings, K E; Clements, M A

    1995-07-01

    The problems of automatic recognition of and synthesis of multistyle speech have become important topics of research in recent years. This paper reports an extensive investigation of the variations that occur in the glottal excitation of eleven commonly encountered speech styles. Glottal waveforms were extracted from utterances of non-nasalized vowels for two speakers for each of the eleven speaking styles. The extracted waveforms were parametrized into four duration-related and two slope-related values. Using these six parameters, the glottal waveforms from the eleven styles of speech were analyzed both qualitatively and quantitatively. The glottal waveforms from each style speech were analyzed both qualitatively and quantitatively. The glottal waveforms from each style of speech have been shown to be significantly and identifiably different from all other styles, thereby confirming the importance of the glottal waveform in conveying speech style information and in causing speech waveform variations. The degree of variation in styled glottal waveforms has been shown to be consistent when trained on one speaker and compared with another.

  17. A maximum likelihood approach to estimating articulator positions from speech acoustics

    SciTech Connect

    Hogden, J.

    1996-09-23

    This proposal presents an algorithm called maximum likelihood continuity mapping (MALCOM) which recovers the positions of the tongue, jaw, lips, and other speech articulators from measurements of the sound-pressure waveform of speech. MALCOM differs from other techniques for recovering articulator positions from speech in three critical respects: it does not require training on measured or modeled articulator positions, it does not rely on any particular model of sound propagation through the vocal tract, and it recovers a mapping from acoustics to articulator positions that is linearly, not topographically, related to the actual mapping from acoustics to articulation. The approach categorizes short-time windows of speech into a finite number of sound types, and assumes the probability of using any articulator position to produce a given sound type can be described by a parameterized probability density function. MALCOM then uses maximum likelihood estimation techniques to: (1) find the most likely smooth articulator path given a speech sample and a set of distribution functions (one distribution function for each sound type), and (2) change the parameters of the distribution functions to better account for the data. Using this technique improves the accuracy of articulator position estimates compared to continuity mapping -- the only other technique that learns the relationship between acoustics and articulation solely from acoustics. The technique has potential application to computer speech recognition, speech synthesis and coding, teaching the hearing impaired to speak, improving foreign language instruction, and teaching dyslexics to read. 34 refs., 7 figs.

  18. Transient noise reduction in speech signal with a modified long-term predictor

    NASA Astrophysics Data System (ADS)

    Choi, Min-Seok; Kang, Hong-Goo

    2011-12-01

    This article proposes an efficient median filter based algorithm to remove transient noise in a speech signal. The proposed algorithm adopts a modified long-term predictor (LTP) as the pre-processor of the noise reduction process to reduce speech distortion caused by the nonlinear nature of the median filter. This article shows that the LTP analysis does not modify to the characteristic of transient noise during the speech modeling process. Oppositely, if a short-term linear prediction (STP) filter is employed as a pre-processor, the enhanced output includes residual noise because the STP analysis and synthesis process keeps and restores transient noise components. To minimize residual noise and speech distortion after the transient noise reduction, a modified LTP method is proposed which estimates the characteristic of speech more accurately. By ignoring transient noise presence regions in the pitch lag detection step, the modified LTP successfully avoids being affected by transient noise. A backward pitch prediction algorithm is also adopted to reduce speech distortion in the onset regions. Experimental results verify that the proposed system efficiently eliminates transient noise while preserving desired speech signal.

  19. Lessons for Speech Pathologists. Using the Initial Teaching Alphabet to Improve Articulation.

    ERIC Educational Resources Information Center

    Goldman, Ronald

    Designed by speech pathologists for use with preschool children, 54 lessons utilize the Initial Teaching Alphabet (ITA). Beginning with the presentation of a single sound and its ITA symbol, lessons progress systematically through all the symbols; synthesis of the elements into syllables, words, sentences, stories, and general conversation is…

  20. Speech recognition by computer. 1964-September 1981 (a bibliography with abstracts)

    SciTech Connect

    Not Available

    1983-02-01

    The cited reports present investigations on the recognition, synthesis, and processing of speech by computer. The research includes the acoustical, phonological, and linguistic processes necessary in the conversion of the various waveforms by computers. (This updated bibliography contains 294 citations, none of which are new entries to the previous edition.)