Sample records for taste voice speech

  1. Voice and Speech after Laryngectomy

    ERIC Educational Resources Information Center

    Stajner-Katusic, Smiljka; Horga, Damir; Musura, Maja; Globlek, Dubravka

    2006-01-01

    The aim of the investigation is to compare voice and speech quality in alaryngeal patients using esophageal speech (ESOP, eight subjects), electroacoustical speech aid (EACA, six subjects) and tracheoesophageal voice prosthesis (TEVP, three subjects). The subjects reading a short story were recorded in the sound-proof booth and the speech samples…

  2. Start/End Delays of Voiced and Unvoiced Speech Signals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Herrnstein, A

    Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measuredmore » acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.« less

  3. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  4. Speech enhancement on smartphone voice recording

    NASA Astrophysics Data System (ADS)

    Tris Atmaja, Bagus; Nur Farid, Mifta; Arifianto, Dhany

    2016-11-01

    Speech enhancement is challenging task in audio signal processing to enhance the quality of targeted speech signal while suppress other noises. In the beginning, the speech enhancement algorithm growth rapidly from spectral subtraction, Wiener filtering, spectral amplitude MMSE estimator to Non-negative Matrix Factorization (NMF). Smartphone as revolutionary device now is being used in all aspect of life including journalism; personally and professionally. Although many smartphones have two microphones (main and rear) the only main microphone is widely used for voice recording. This is why the NMF algorithm widely used for this purpose of speech enhancement. This paper evaluate speech enhancement on smartphone voice recording by using some algorithms mentioned previously. We also extend the NMF algorithm to Kulback-Leibler NMF with supervised separation. The last algorithm shows improved result compared to others by spectrogram and PESQ score evaluation.

  5. Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

    PubMed

    Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

    2018-05-01

    Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.

  6. Acoustic Analysis of the Voiced-Voiceless Distinction in Dutch Tracheoesophageal Speech

    ERIC Educational Resources Information Center

    Jongmans, Petra; Wempe, Ton G.; van Tinteren, Harm; Hilgers, Frans J. M.; Pols, Louis C. W.; van As-Brooks, Corina J.

    2010-01-01

    Purpose: Confusions between voiced and voiceless plosives and voiced and voiceless fricatives are common in Dutch tracheoesophageal (TE) speech. This study investigates (a) which acoustic measures are found to convey a correct voicing contrast in TE speech and (b) whether different measures are found in TE speech than in normal laryngeal (NL)…

  7. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C [Livermore, CA; Holzrichter, John F [Berkeley, CA; Ng, Lawrence C [Danville, CA

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  8. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  9. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  10. Intra-oral pressure-based voicing control of electrolaryngeal speech with intra-oral vibrator.

    PubMed

    Takahashi, Hirokazu; Nakao, Masayuki; Kikuchi, Yataro; Kaga, Kimitaka

    2008-07-01

    In normal speech, coordinated activities of intrinsic laryngeal muscles suspend a glottal sound at utterance of voiceless consonants, automatically realizing a voicing control. In electrolaryngeal speech, however, the lack of voicing control is one of the causes of unclear voice, voiceless consonants tending to be misheard as the corresponding voiced consonants. In the present work, we developed an intra-oral vibrator with an intra-oral pressure sensor that detected utterance of voiceless phonemes during the intra-oral electrolaryngeal speech, and demonstrated that an intra-oral pressure-based voicing control could improve the intelligibility of the speech. The test voices were obtained from one electrolaryngeal speaker and one normal speaker. We first investigated on the speech analysis software how a voice onset time (VOT) and first formant (F1) transition of the test consonant-vowel syllables contributed to voiceless/voiced contrasts, and developed an adequate voicing control strategy. We then compared the intelligibility of consonant-vowel syllables among the intra-oral electrolaryngeal speech with and without online voicing control. The increase of intra-oral pressure, typically with a peak ranging from 10 to 50 gf/cm2, could reliably identify utterance of voiceless consonants. The speech analysis and intelligibility test then demonstrated that a short VOT caused the misidentification of the voiced consonants due to a clear F1 transition. Finally, taking these results together, the online voicing control, which suspended the prosthetic tone while the intra-oral pressure exceeded 2.5 gf/cm2 and during the 35 milliseconds that followed, proved efficient to improve the voiceless/voiced contrast.

  11. Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

    PubMed

    Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar

    2016-10-01

    Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.

  12. Intentional Voice Command Detection for Trigger-Free Speech Interface

    NASA Astrophysics Data System (ADS)

    Obuchi, Yasunari; Sumiyoshi, Takashi

    In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.

  13. Double Fourier analysis for Emotion Identification in Voiced Speech

    NASA Astrophysics Data System (ADS)

    Sierra-Sosa, D.; Bastidas, M.; Ortiz P., D.; Quintero, O. L.

    2016-04-01

    We propose a novel analysis alternative, based on two Fourier Transforms for emotion recognition from speech. Fourier analysis allows for display and synthesizes different signals, in terms of power spectral density distributions. A spectrogram of the voice signal is obtained performing a short time Fourier Transform with Gaussian windows, this spectrogram portraits frequency related features, such as vocal tract resonances and quasi-periodic excitations during voiced sounds. Emotions induce such characteristics in speech, which become apparent in spectrogram time-frequency distributions. Later, the signal time-frequency representation from spectrogram is considered an image, and processed through a 2-dimensional Fourier Transform in order to perform the spatial Fourier analysis from it. Finally features related with emotions in voiced speech are extracted and presented.

  14. Telephony-based voice pathology assessment using automated speech analysis.

    PubMed

    Moran, Rosalyn J; Reilly, Richard B; de Chazal, Philip; Lacy, Peter D

    2006-03-01

    A system for remotely detecting vocal fold pathologies using telephone-quality speech is presented. The system uses a linear classifier, processing measurements of pitch perturbation, amplitude perturbation and harmonic-to-noise ratio derived from digitized speech recordings. Voice recordings from the Disordered Voice Database Model 4337 system were used to develop and validate the system. Results show that while a sustained phonation, recorded in a controlled environment, can be classified as normal or pathologic with accuracy of 89.1%, telephone-quality speech can be classified as normal or pathologic with an accuracy of 74.2%, using the same scheme. Amplitude perturbation features prove most robust for telephone-quality speech. The pathologic recordings were then subcategorized into four groups, comprising normal, neuromuscular pathologic, physical pathologic and mixed (neuromuscular with physical) pathologic. A separate classifier was developed for classifying the normal group from each pathologic subcategory. Results show that neuromuscular disorders could be detected remotely with an accuracy of 87%, physical abnormalities with an accuracy of 78% and mixed pathology voice with an accuracy of 61%. This study highlights the real possibility for remote detection and diagnosis of voice pathology.

  15. Speech perception in individuals with auditory dys-synchrony: effect of lengthening of voice onset time and burst duration of speech segments.

    PubMed

    Kumar, U A; Jayaram, M

    2013-07-01

    The purpose of this study was to evaluate the effect of lengthening of voice onset time and burst duration of selected speech stimuli on perception by individuals with auditory dys-synchrony. This is the second of a series of articles reporting the effect of signal enhancing strategies on speech perception by such individuals. Two experiments were conducted: (1) assessment of the 'just-noticeable difference' for voice onset time and burst duration of speech sounds; and (2) assessment of speech identification scores when speech sounds were modified by lengthening the voice onset time and the burst duration in units of one just-noticeable difference, both in isolation and in combination with each other plus transition duration modification. Lengthening of voice onset time as well as burst duration improved perception of voicing. However, the effect of voice onset time modification was greater than that of burst duration modification. Although combined lengthening of voice onset time, burst duration and transition duration resulted in improved speech perception, the improvement was less than that due to lengthening of transition duration alone. These results suggest that innovative speech processing strategies that enhance temporal cues may benefit individuals with auditory dys-synchrony.

  16. Assessment of voice and speech symptoms in early Parkinson's disease by the Robertson dysarthria profile.

    PubMed

    Defazio, Giovanni; Guerrieri, Marta; Liuzzi, Daniele; Gigante, Angelo Fabio; di Nicola, Vincenzo

    2016-03-01

    Changes in voice and speech are thought to involve 75-90% of people with PD, but the impact of PD progression on voice/speech parameters is not well defined. In this study, we assessed voice/speech symptoms in 48 parkinsonian patients staging <3 on the modified Hoehn and Yahr scale and 37 healthy subjects using the Robertson dysarthria profile (a clinical-perceptual method exploring all components potentially involved in speech difficulties), the Voice handicap index (a validated measure of the impact of voice symptoms on quality of life) and the speech evaluation parameter contained in the Unified Parkinson's Disease Rating Scale part III (UPDRS-III). Accuracy and metric properties of the Robertson dysarthria profile were also measured. On Robertson dysarthria profile, all parkinsonian patients yielded lower scores than healthy control subjects. Differently, the Voice Handicap Index and the speech evaluation parameter contained in the UPDRS-III could detect speech/voice disturbances in 10 and 75% of PD patients, respectively. Validation procedure in Parkinson's disease patients showed that the Robertson dysarthria profile has acceptable reliability, satisfactory internal consistency and scaling assumptions, lack of floor and ceiling effects, and partial correlations with UPDRS-III and Voice Handicap Index. We concluded that speech/voice disturbances are widely identified by the Robertson dysarthria profile in early parkinsonian patients, even when the disturbances do not carry a significant level of disability. Robertson dysarthria profile may be a valuable tool to detect speech/voice disturbances in Parkinson's disease.

  17. Exploring expressivity and emotion with artificial voice and speech technologies.

    PubMed

    Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

    2013-10-01

    Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.

  18. Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech.

    PubMed

    Gerratt, Bruce R; Kreiman, Jody; Garellek, Marc

    2016-10-01

    The question of what type of utterance-a sustained vowel or continuous speech-is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation. Speakers with voice disorders sustained vowels and read sentences. Vowel samples were excerpted from the steadiest portion of each vowel in the sentences. In addition to sustained and excerpted vowels, a 3rd set of stimuli was created by shortening sustained vowel productions to match the duration of vowels excerpted from continuous speech. Acoustic measures were made on the stimuli, and listeners judged the severity of vocal quality deviation. Sustained vowels and those extracted from continuous speech contain essentially the same acoustic and perceptual information about vocal quality deviation. Perceived and/or measured differences between continuous speech and sustained vowels derive largely from voice source variability across segmental and prosodic contexts and not from variations in vocal fold vibration in the quasisteady portion of the vowels. Approaches to voice quality assessment by using continuous speech samples average across utterances and may not adequately quantify the variability they are intended to assess.

  19. Perception of the Voicing Distinction in Speech Produced during Simultaneous Communication

    ERIC Educational Resources Information Center

    MacKenzie, Douglas J.; Schiavetti, Nicholas; Whitehead, Robert L.; Metz, Dale Evan

    2006-01-01

    This study investigated the perception of voice onset time (VOT) in speech produced during simultaneous communication (SC). Four normally hearing, experienced sign language users were recorded under SC and speech alone (SA) conditions speaking stimulus words with voiced and voiceless initial consonants embedded in a sentence. Twelve…

  20. A voice-input voice-output communication aid for people with severe speech impairment.

    PubMed

    Hawley, Mark S; Cunningham, Stuart P; Green, Phil D; Enderby, Pam; Palmer, Rebecca; Sehgal, Siddharth; O'Neill, Peter

    2013-01-01

    A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment-the voice-input voice-output communication aid (VIVOCA)-is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified and refined key requirements for the device. A novel methodology for building small vocabulary, speaker-dependent automatic speech recognizers with reduced amounts of training data, was applied. Experiments showed that this method is successful in generating good recognition performance (mean accuracy 96%) on highly disordered speech, even when recognition perplexity is increased. The selected message-building technique traded off various factors including speed of message construction and range of available message outputs. The VIVOCA was evaluated in a field trial by individuals with moderate to severe dysarthria and confirmed that they can make use of the device to produce intelligible speech output from disordered speech input. The trial highlighted some issues which limit the performance and usability of the device when applied in real usage situations, with mean recognition accuracy of 67% in these circumstances. These limitations will be addressed in future work.

  1. Speech masking and cancelling and voice obscuration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Holzrichter, John F.

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  2. Speech therapy and voice recognition instrument

    NASA Technical Reports Server (NTRS)

    Cohen, J.; Babcock, M. L.

    1972-01-01

    Characteristics of electronic circuit for examining variations in vocal excitation for diagnostic purposes and in speech recognition for determiniog voice patterns and pitch changes are described. Operation of the circuit is discussed and circuit diagram is provided.

  3. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  4. Speech and Voice Response to a Levodopa Challenge in Late-Stage Parkinson's Disease.

    PubMed

    Fabbri, Margherita; Guimarães, Isabel; Cardoso, Rita; Coelho, Miguel; Guedes, Leonor Correia; Rosa, Mario M; Godinho, Catarina; Abreu, Daisy; Gonçalves, Nilza; Antonini, Angelo; Ferreira, Joaquim J

    2017-01-01

    Parkinson's disease (PD) patients are affected by hypokinetic dysarthria, characterized by hypophonia and dysprosody, which worsens with disease progression. Levodopa's (l-dopa) effect on quality of speech is inconclusive; no data are currently available for late-stage PD (LSPD). To assess the modifications of speech and voice in LSPD following an acute l-dopa challenge. LSPD patients [Schwab and England score <50/Hoehn and Yahr stage >3 (MED ON)] performed several vocal tasks before and after an acute l-dopa challenge. The following was assessed: respiratory support for speech, voice quality, stability and variability, speech rate, and motor performance (MDS-UPDRS-III). All voice samples were recorded and analyzed by a speech and language therapist blinded to patients' therapeutic condition using Praat 5.1 software. 24/27 (14 men) LSPD patients succeeded in performing voice tasks. Median age and disease duration of patients were 79 [IQR: 71.5-81.7] and 14.5 [IQR: 11-15.7] years, respectively. In MED OFF, respiratory breath support and pitch break time of LSPD patients were worse than the normative values of non-parkinsonian. A correlation was found between disease duration and voice quality ( R  = 0.51; p  = 0.013) and speech rate ( R  = -0.55; p  = 0.008). l-Dopa significantly improved MDS-UPDRS-III score (20%), with no effect on speech as assessed by clinical rating scales and automated analysis. Speech is severely affected in LSPD. Although l-dopa had some effect on motor performance, including axial signs, speech and voice did not improve. The applicability and efficacy of non-pharmacological treatment for speech impairment should be considered for speech disorder management in PD.

  5. Fluid-acoustic interactions and their impact on pathological voiced speech

    NASA Astrophysics Data System (ADS)

    Erath, Byron D.; Zanartu, Matias; Peterson, Sean D.; Plesniak, Michael W.

    2011-11-01

    Voiced speech is produced by vibration of the vocal fold structures. Vocal fold dynamics arise from aerodynamic pressure loadings, tissue properties, and acoustic modulation of the driving pressures. Recent speech science advancements have produced a physiologically-realistic fluid flow solver (BLEAP) capable of prescribing asymmetric intraglottal flow attachment that can be easily assimilated into reduced order models of speech. The BLEAP flow solver is extended to incorporate acoustic loading and sound propagation in the vocal tract by implementing a wave reflection analog approach for sound propagation based on the governing BLEAP equations. This enhanced physiological description of the physics of voiced speech is implemented into a two-mass model of speech. The impact of fluid-acoustic interactions on vocal fold dynamics is elucidated for both normal and pathological speech through linear and nonlinear analysis techniques. Supported by NSF Grant CBET-1036280.

  6. Influence of voice focus on tongue movement in speech.

    PubMed

    Bressmann, Tim; de Boer, Gillian; Marino, Viviane Cristina de Castro; Fabron, Eliana Maria Gradim; Berti, Larissa Cristina

    2017-01-01

    The present study evaluated global aspects of lingual movement during sentence production with backward and forward voice focus. Nine female participants read a sentence with a variety of consonants in a normal condition and with backward and forward voice focus. Midsagittal tongue movement was recorded with ultrasound. Tongue height over time at an anterior, a central, and a posterior measurement angle was measured. The outcome measures were speech rate, cumulative distance travelled, and average movement speed of the tongue. There were no differences in speech rate between the different conditions. The cumulative distance travelled by the tongue and the average speed indicated that the posterior tongue travelled a smaller cumulative distance and at a slower speed in the forward focus condition. The central tongue moved a larger cumulative distance and at a higher speed in the backward focus condition. The study offers first insights on how tongue movement is affected by different voice focus settings and illustrates the plasticity of tongue movement in speech.

  7. Voice Quality Modelling for Expressive Speech Synthesis

    PubMed Central

    Socoró, Joan Claudi

    2014-01-01

    This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F 0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics. PMID:24587738

  8. The Human Voice in Speech and Singing

    NASA Astrophysics Data System (ADS)

    Lindblom, Björn; Sundberg, Johan

    This chapter speech describes various aspects of the human voice as a means of communication in speech and singing. From the point of view of function, vocal sounds can be regarded as the end result of a three stage process: (1) the compression of air in the respiratory system, which produces an exhalatory airstream, (2) the vibrating vocal folds' transformation of this air stream to an intermittent or pulsating air stream, which is a complex tone, referred to as the voice source, and (3) the filtering of this complex tone in the vocal tract resonator. The main function of the respiratory system is to generate an overpressure of air under the glottis, or a subglottal pressure. Section 16.1 describes different aspects of the respiratory system of significance to speech and singing, including lung volume ranges, subglottal pressures, and how this pressure is affected by the ever-varying recoil forces. The complex tone generated when the air stream from the lungs passes the vibrating vocal folds can be varied in at least three dimensions: fundamental frequency, amplitude and spectrum. Section 16.2 describes how these properties of the voice source are affected by the subglottal pressure, the length and stiffness of the vocal folds and how firmly the vocal folds are adducted. Section 16.3 gives an account of the vocal tract filter, how its form determines the frequencies of its resonances, and Sect. 16.4 gives an account for how these resonance frequencies or formants shape the vocal sounds by imposing spectrum peaks separated by spectrum valleys, and how the frequencies of these peaks determine vowel and voice qualities. The remaining sections of the chapter describe various aspects of the acoustic signals used for vocal communication in speech and singing. The syllable structure is discussed in Sect. 16.5, the closely related aspects of rhythmicity and timing in speech and singing is described in Sect. 16.6, and pitch and rhythm

  9. Influence of musical training on understanding voiced and whispered speech in noise.

    PubMed

    Ruggles, Dorea R; Freyman, Richard L; Oxenham, Andrew J

    2014-01-01

    This study tested the hypothesis that the previously reported advantage of musicians over non-musicians in understanding speech in noise arises from more efficient or robust coding of periodic voiced speech, particularly in fluctuating backgrounds. Speech intelligibility was measured in listeners with extensive musical training, and in those with very little musical training or experience, using normal (voiced) or whispered (unvoiced) grammatically correct nonsense sentences in noise that was spectrally shaped to match the long-term spectrum of the speech, and was either continuous or gated with a 16-Hz square wave. Performance was also measured in clinical speech-in-noise tests and in pitch discrimination. Musicians exhibited enhanced pitch discrimination, as expected. However, no systematic or statistically significant advantage for musicians over non-musicians was found in understanding either voiced or whispered sentences in either continuous or gated noise. Musicians also showed no statistically significant advantage in the clinical speech-in-noise tests. Overall, the results provide no evidence for a significant difference between young adult musicians and non-musicians in their ability to understand speech in noise.

  10. Audiovisual speech facilitates voice learning.

    PubMed

    Sheffert, Sonya M; Olson, Elizabeth

    2004-02-01

    In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.

  11. Relationship between quality of life instruments and phonatory function in tracheoesophageal speech with voice prosthesis.

    PubMed

    Miyoshi, Masayuki; Fukuhara, Takahiro; Kataoka, Hideyuki; Hagino, Hiroshi

    2016-04-01

    The use of tracheoesophageal speech with voice prosthesis (T-E speech) after total laryngectomy has increased recently as a method of vocalization following laryngeal cancer. Previous research has not investigated the relationship between quality of life (QOL) and phonatory function in those using T-E speech. This study aimed to demonstrate the relationship between phonatory function and both comprehensive health-related QOL and QOL related to speech in people using T-E speech. The subjects of the study were 20 male patients using T-E speech after total laryngectomy. At a visit to our clinic, the subjects underwent a phonatory function test and completed three questionnaires: the MOS 8-Item Short-Form Health Survey (SF-8), the Voice Handicap Index-10 (VHI-10), and the Voice-Related Quality of Life (V-RQOL) Measure. A significant correlation was observed between the physical component summary (PCS), a summary score of SF-8, and VHI-10. Additionally, a significant correlation was observed between the SF-8 mental component summary (MCS) and both VHI-10 and VRQOL. Significant correlations were also observed between voice intensity in the phonatory function test and both VHI-10 and V-RQOL. Finally, voice intensity was significantly correlated with the SF-8 PCS. QOL questionnaires and phonatory function tests showed that, in people using T-E speech after total laryngectomy, voice intensity was correlated with comprehensive QOL, including physical and mental health. This finding suggests that voice intensity can be used as a performance index for speech rehabilitation.

  12. Patient-reported symptom questionnaires in laryngeal cancer: voice, speech and swallowing.

    PubMed

    Rinkel, R N P M; Verdonck-de Leeuw, I M; van den Brakel, N; de Bree, R; Eerenstein, S E J; Aaronson, N; Leemans, C R

    2014-08-01

    To validate questionnaires on voice, speech, and swallowing among laryngeal cancer patients, to assess the need for and use of rehabilitation services, and to determine the association between voice, speech, and swallowing problems, and quality of life and distress. Laryngeal cancer patients at least three months post-treatment completed the VHI (voice), SHI (speech), SWAL-QOL (swallowing), EORTC QLQ-C30, QLQ-HN35, HADS, and study-specific questions on rehabilitation. Eighty-eight patients and 110 healthy controls participated. Cut off scores of 15, 6, and 14 were defined for the VHI, SHI, and SWAL-QOL (sensitivity > 90%; specificity > 80%). Based on these scores, 56% of the patients reported voice, 63% speech, and 54% swallowing problems. VHI, SHI, and SWAL-QOL scores were associated significantly with quality of life (EORTC QLQ-C30 global quality of life scale) (r = .43 (VHI and SHI) and r = .46 (SWAL-QOL)) and distress (r = .50 (VHI and SHI) and r = .58 (SWAL-QOL)). In retrospect, 32% of the patients indicated the need for rehabilitation at time of treatment, and 81% of these patients availed themselves of such services. Post-treatment, 8% of the patients expressed a need for rehabilitation, and 20% of these patients actually made use of such services. Psychometric characteristics of the VHI, SHI, and SWAL-QOL in laryngeal cancer patients are good. The prevalence of voice, speech, and swallowing problems is high, and clearly related to quality of life and distress. Although higher during than after treatment, the perceived need for and use of rehabilitation services is limited. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. Comparison of Voice Handicap Index Scores Between Female Students of Speech Therapy and Other Health Professions.

    PubMed

    Tafiadis, Dionysios; Chronopoulos, Spyridon K; Siafaka, Vassiliki; Drosos, Konstantinos; Kosma, Evangelia I; Toki, Eugenia I; Ziavra, Nausica

    2017-09-01

    Students' groups (eg, teachers, speech language pathologists) are presumably at risk of developing a voice disorder due to misuse of their voice, which will affect their way of living. Multidisciplinary voice assessment of student populations is currently spread widely along with the use of self-reported questionnaires. This study compared the Voice Handicap Index domains and item scores between female students of speech and language therapy and of other health professions in Greece. We also examined the probability of speech language therapy students developing any vocal symptom. Two hundred female non-dysphonic students (aged 18-31) were recruited. Participants answered the Voice Evaluation Form and the Greek adaptation of the Voice Handicap Index. Significant differences were observed between the two groups (students of speech therapy and other health professions) through Voice Handicap Index (total score, functional and physical domains), excluding the emotional domain. Furthermore, significant differences for specific Voice Handicap Index items, between subgroups, were observed. In conclusion, speech language therapy students had higher Voice Handicap Index scores, which probably could be an indicator for avoiding profession-related dysphonia at a later stage. Also, Voice Handicap Index could be at a first glance an assessment tool for the recognition of potential voice disorder development in students. In turn, the results could be used for indirect therapy approaches, such as providing methods for maintaining vocal health in different student populations. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  14. Stop consonant voicing in young children's speech: Evidence from a cross-sectional study

    NASA Astrophysics Data System (ADS)

    Ganser, Emily

    There are intuitive reasons to believe that speech-sound acquisition and language acquisition should be related in development. Surprisingly, only recently has research begun to parse just how the two might be related. This study investigated possible correlations between speech-sound acquisition and language acquisition, as part of a large-scale, longitudinal study of the relationship between different types of phonological development and vocabulary growth in the preschool years. Productions of voiced and voiceless stop-initial words were recorded from 96 children aged 28-39 months. Voice Onset Time (VOT, in ms) for each token context was calculated. A mixed-model logistic regression was calculated which predicted whether the sound was intended to be voiced or voiceless based on its VOT. This model estimated the slopes of the logistic function for each child. This slope was referred to as Robustness of Contrast (based on Holliday, Reidy, Beckman, and Edwards, 2015), defined as being the degree of categorical differentiation between the production of two speech sounds or classes of sounds, in this case, voiced and voiceless stops. Results showed a wide range of slopes for individual children, suggesting that slope-derived Robustness of Contrast could be a viable means of measuring a child's acquisition of the voicing contrast. Robustness of Contrast was then compared to traditional measures of speech and language skills to investigate whether there was any correlation between the production of stop voicing and broader measures of speech and language development. The Robustness of Contrast measure was found to correlate with all individual measures of speech and language, suggesting that it might indeed be predictive of later language skills.

  15. The Human Voice in Speech and Singing

    NASA Astrophysics Data System (ADS)

    Lindblom, Björn; Sundberg, Johan

    This chapter describes various aspects of the human voice as a means of communication in speech and singing. From the point of view of function, vocal sounds can be regarded as the end result of a three stage process: (1) the compression of air in the respiratory system, which produces an exhalatory airstream, (2) the vibrating vocal folds' transformation of this air stream to an intermittent or pulsating air stream, which is a complex tone, referred to as the voice source, and (3) the filtering of this complex tone in the vocal tract resonator. The main function of the respiratory system is to generate an overpressure of air under the glottis, or a subglottal pressure. Section 16.1 describes different aspects of the respiratory system of significance to speech and singing, including lung volume ranges, subglottal pressures, and how this pressure is affected by the ever-varying recoil forces. The complex tone generated when the air stream from the lungs passes the vibrating vocal folds can be varied in at least three dimensions: fundamental frequency, amplitude and spectrum. Section 16.2 describes how these properties of the voice source are affected by the subglottal pressure, the length and stiffness of the vocal folds and how firmly the vocal folds are adducted. Section 16.3 gives an account of the vocal tract filter, how its form determines the frequencies of its resonances, and Sect. 16.4 gives an account for how these resonance frequencies or formants shape the vocal sounds by imposing spectrum peaks separated by spectrum valleys, and how the frequencies of these peaks determine vowel and voice qualities. The remaining sections of the chapter describe various aspects of the acoustic signals used for vocal communication in speech and singing. The syllable structure is discussed in Sect. 16.5, the closely related aspects of rhythmicity and timing in speech and singing is described in Sect. 16.6, and pitch and rhythm aspects in Sect. 16.7. The impressive control

  16. Assessment of voice, speech and communication changes associated with cervical spinal cord injury.

    PubMed

    Johansson, Kerstin; Seiger, Åke; Forsén, Malin; Holmgren Nilsson, Jeanette; Hartelius, Lena; Schalling, Ellika

    2018-02-24

    Respiratory muscle impairment following cervical spinal cord injury (CSCI) may lead to reduced voice function, although the individual variation is large. Voice problems in this population may not always receive attention since individuals with CSCI face other, more acute and life-threatening issues that need/receive attention. Currently there is no consensus on the tasks suitable to identify the specific voice impairments and functional voice changes experienced by individuals with CSCI. To examine which voice/speech tasks identify the specific voice and communication changes associated with CSCI, habitual and maximum speech performance of a group with CSCI was compared with that of a healthy control group (CG), and the findings were related to respiratory function and to self-reported voice problems. Respiratory, aerodynamic, acoustic and self-reported voice data from 19 individuals (nine women and 10 men, aged 23-59 years, heights = 153-192 cm) with CSCI (levels C3-C7) were compared with data from a CG consisting of 19 carefully matched non-injured people (nine women and 10 men, aged 19-59 years, heights = 152-187 cm). Despite considerable variability of performance, highly significant differences between the group with CSCI and the CG were found in maximum phonation time, maximum duration of breath phrases, maximum sound pressure level and maximum voice area in voice-range profiles (all p = .000). Subglottal pressure was lower and phonatory stability was reduced in some of the individuals with CSCI, but differences between the groups were not statistically significant. Six of 19 had voice handicap index (VHI) scores above 20 (the cut-off for voice disorder). Individuals with a vital capacity below 50% of the expected for an equivalent reference individual performed significantly worse than participants with more normal vital capacity. Completeness and level of injury seemed to impact vocal function in some individuals. A combination of maximum performance

  17. Should singing activities be included in speech and voice therapy for prepubertal children?

    PubMed

    Rinta, Tiija; Welch, Graham F

    2008-01-01

    Customarily, speaking and singing have tended to be regarded as two completely separate sets of behaviors in clinical and educational settings. The treatment of speech and voice disorders has focused on the client's speaking ability, as this is perceived to be the main vocal behavior of concern. However, according to a broader voice-science perspective, given that the same vocal structure is used for speaking and singing, it may be possible to include singing in speech and voice therapy. In this article, a theoretical framework is proposed that indicates possible benefits from the inclusion of singing in such therapeutic settings. Based on a literature review, it is demonstrated theoretically why singing activities can potentially be exploited in the treatment of prepubertal children suffering from speech and voice disorders. Based on this theoretical framework, implications for further empirical research and practice are suggested.

  18. Effects of cross-language voice training on speech perception: Whose familiar voices are more intelligible?

    PubMed Central

    Levi, Susannah V.; Winters, Stephen J.; Pisoni, David B.

    2011-01-01

    Previous research has shown that familiarity with a talker’s voice can improve linguistic processing (herein, “Familiar Talker Advantage”), but this benefit is constrained by the context in which the talker’s voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers’ voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers’ voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers. PMID:22225059

  19. Utility and accuracy of perceptual voice and speech distinctions in the diagnosis of Parkinson's disease, PSP and MSA-P.

    PubMed

    Miller, Nick; Nath, Uma; Noble, Emma; Burn, David

    2017-06-01

    To determine if perceptual speech measures distinguish people with Parkinson's disease (PD), multiple system atrophy with predominant parkinsonism (MSA-P) and progressive supranuclear palsy (PSP). Speech-language therapists blind to patient characteristics employed clinical rating scales to evaluate speech/voice in 24 people with clinically diagnosed PD, 17 with PSP and 9 with MSA-P, matched for disease duration (mean 4.9 years, standard deviation 2.2). No consistent intergroup differences appeared on specific speech/voice variables. People with PD were significantly less impaired on overall speech/voice severity. Analyses by severity suggested further investigation around laryngeal, resonance and fluency changes may characterize individual groups. MSA-P and PSP compared with PD were distinguished by severity of speech/voice deterioration, but individual speech/voice parameters failed to consistently differentiate groups.

  20. Tutorial and Guidelines on Measurement of Sound Pressure Level in Voice and Speech.

    PubMed

    Švec, Jan G; Granqvist, Svante

    2018-03-15

    Sound pressure level (SPL) measurement of voice and speech is often considered a trivial matter, but the measured levels are often reported incorrectly or incompletely, making them difficult to compare among various studies. This article aims at explaining the fundamental principles behind these measurements and providing guidelines to improve their accuracy and reproducibility. Basic information is put together from standards, technical, voice and speech literature, and practical experience of the authors and is explained for nontechnical readers. Variation of SPL with distance, sound level meters and their accuracy, frequency and time weightings, and background noise topics are reviewed. Several calibration procedures for SPL measurements are described for stand-mounted and head-mounted microphones. SPL of voice and speech should be reported together with the mouth-to-microphone distance so that the levels can be related to vocal power. Sound level measurement settings (i.e., frequency weighting and time weighting/averaging) should always be specified. Classified sound level meters should be used to assure measurement accuracy. Head-mounted microphones placed at the proximity of the mouth improve signal-to-noise ratio and can be taken advantage of for voice SPL measurements when calibrated. Background noise levels should be reported besides the sound levels of voice and speech.

  1. The Impact of Dysphonic Voices on Healthy Listeners: Listener Reaction Times, Speech Intelligibility, and Listener Comprehension.

    PubMed

    Evitts, Paul M; Starmer, Heather; Teets, Kristine; Montgomery, Christen; Calhoun, Lauren; Schulze, Allison; MacKenzie, Jenna; Adams, Lauren

    2016-11-01

    There is currently minimal information on the impact of dysphonia secondary to phonotrauma on listeners. Considering the high incidence of voice disorders with professional voice users, it is important to understand the impact of a dysphonic voice on their audiences. Ninety-one healthy listeners (39 men, 52 women; mean age = 23.62 years) were presented with speech stimuli from 5 healthy speakers and 5 speakers diagnosed with dysphonia secondary to phonotrauma. Dependent variables included processing speed (reaction time [RT] ratio), speech intelligibility, and listener comprehension. Voice quality ratings were also obtained for all speakers by 3 expert listeners. Statistical results showed significant differences between RT ratio and number of speech intelligibility errors between healthy and dysphonic voices. There was not a significant difference in listener comprehension errors. Multiple regression analyses showed that voice quality ratings from the Consensus Assessment Perceptual Evaluation of Voice (Kempster, Gerratt, Verdolini Abbott, Barkmeier-Kraemer, & Hillman, 2009) were able to predict RT ratio and speech intelligibility but not listener comprehension. Results of the study suggest that although listeners require more time to process and have more intelligibility errors when presented with speech stimuli from speakers with dysphonia secondary to phonotrauma, listener comprehension may not be affected.

  2. "Who" is saying "what"? Brain-based decoding of human voice and speech.

    PubMed

    Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer

    2008-11-07

    Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.

  3. 76 FR 66734 - National Institute on Deafness and Other Communication Disorders Draft 2012-2016 Strategic Plan

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-10-27

    ... areas of hearing and balance; smell and taste; and voice, speech, and language. The Strategic Plan... research training in the normal and disordered processes of hearing, balance, smell, taste, voice, speech... into three program areas: Hearing and balance; smell and taste; and voice, speech, and language. The...

  4. 17 Ways to Say Yes: Toward Nuanced Tone of Voice in AAC and Speech Technology

    PubMed Central

    Pullin, Graham; Hennig, Shannon

    2015-01-01

    Abstract People with complex communication needs who use speech-generating devices have very little expressive control over their tone of voice. Despite its importance in human interaction, the issue of tone of voice remains all but absent from AAC research and development however. In this paper, we describe three interdisciplinary projects, past, present and future: The critical design collection Six Speaking Chairs has provoked deeper discussion and inspired a social model of tone of voice; the speculative concept Speech Hedge illustrates challenges and opportunities in designing more expressive user interfaces; the pilot project Tonetable could enable participatory research and seed a research network around tone of voice. We speculate that more radical interactions might expand frontiers of AAC and disrupt speech technology as a whole. PMID:25965913

  5. Changes in Voice Onset Time and Motor Speech Skills in Children following Motor Speech Therapy: Evidence from /pa/ productions

    PubMed Central

    Yu, Vickie Y.; Kadis, Darren S.; Oh, Anna; Goshulak, Debra; Namasivayam, Aravind; Pukonen, Margit; Kroll, Robert; De Nil, Luc F.; Pang, Elizabeth W.

    2016-01-01

    This study evaluated changes in motor speech control and inter-gestural coordination for children with speech sound disorders (SSD) subsequent to PROMPT (Prompts for Restructuring Oral Muscular Phonetic Targets) intervention. We measured the distribution patterns of voice onset time (VOT) for a voiceless stop (/p/) to examine the changes in inter-gestural coordination. Two standardized tests were used (VMPAC, GFTA-2) to assess the changes in motor speech skills and articulation. Data showed positive changes in patterns of VOT with a lower pattern of variability. All children showed significantly higher scores for VMPAC, but only some children showed higher scores for GFTA-2. Results suggest that the proprioceptive feedback provided through PROMPT had a positive influence on motor speech control and inter-gestural coordination in voicing behavior. This set of VOT data for children with SSD adds to our understanding of the speech characteristics underlying motor speech control. Directions for future studies are discussed. PMID:24446799

  6. Effects of an Extended Version of the Lee Silverman Voice Treatment on Voice and Speech in Parkinson's Disease

    ERIC Educational Resources Information Center

    Spielman, Jennifer; Ramig, Lorraine O.; Mahler, Leslie; Halpern, Angela; Gavin, William J.

    2007-01-01

    Purpose: The present study examined vocal SPL, voice handicap, and speech characteristics in Parkinson's disease (PD) following an extended version of the Lee Silverman Voice Treatment (LSVT), to help determine whether current treatment dosages can be altered without compromising clinical outcomes. Method: Twelve participants with idiopathic PD…

  7. Noise Robust Speech Recognition Applied to Voice-Driven Wheelchair

    NASA Astrophysics Data System (ADS)

    Sasou, Akira; Kojima, Hiroaki

    2009-12-01

    Conventional voice-driven wheelchairs usually employ headset microphones that are capable of achieving sufficient recognition accuracy, even in the presence of surrounding noise. However, such interfaces require users to wear sensors such as a headset microphone, which can be an impediment, especially for the hand disabled. Conversely, it is also well known that the speech recognition accuracy drastically degrades when the microphone is placed far from the user. In this paper, we develop a noise robust speech recognition system for a voice-driven wheelchair. This system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors. We verified the effectiveness of our system in experiments in different environments, and confirmed that our system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors.

  8. A National Test of Taste and Smell

    MedlinePlus

    ... Javascript on. Feature: Taste, Smell, Hearing, Language, Voice, Balance At Last: A National Test of Taste and ... smell. Read More "Taste, Smell, Hearing, Language, Voice, Balance" Articles At Last: A National Test of Taste ...

  9. Speech transport for packet telephony and voice over IP

    NASA Astrophysics Data System (ADS)

    Baker, Maurice R.

    1999-11-01

    Recent advances in packet switching, internetworking, and digital signal processing technologies have converged to allow realizable practical implementations of packet telephony systems. This paper provides a tutorial on transmission engineering for packet telephony covering the topics of speech coding/decoding, speech packetization, packet data network transport, and impairments which may negatively impact end-to-end system quality. Particular emphasis is placed upon Voice over Internet Protocol given the current popularity and ubiquity of IP transport.

  10. Silent reading of direct versus indirect speech activates voice-selective areas in the auditory cortex.

    PubMed

    Yao, Bo; Belin, Pascal; Scheepers, Christoph

    2011-10-01

    In human communication, direct speech (e.g., Mary said: "I'm hungry") is perceived to be more vivid than indirect speech (e.g., Mary said [that] she was hungry). However, for silent reading, the representational consequences of this distinction are still unclear. Although many of us share the intuition of an "inner voice," particularly during silent reading of direct speech statements in text, there has been little direct empirical confirmation of this experience so far. Combining fMRI with eye tracking in human volunteers, we show that silent reading of direct versus indirect speech engenders differential brain activation in voice-selective areas of the auditory cortex. This suggests that readers are indeed more likely to engage in perceptual simulations (or spontaneous imagery) of the reported speaker's voice when reading direct speech as opposed to meaning-equivalent indirect speech statements as part of a more vivid representation of the former. Our results may be interpreted in line with embodied cognition and form a starting point for more sophisticated interdisciplinary research on the nature of auditory mental simulation during reading.

  11. The stop voicing contrast in French: From citation speech to sentencial speech

    NASA Astrophysics Data System (ADS)

    Abdelli-Beruh, Nassima; Demaio, Eileen; Hisagi, Miwako

    2004-05-01

    This study explores the influence of speaking style on the salience of the acoustic correlates to the stop voicing distinction in French. Monolingual French speakers produced twenty-one C_vC_ syllables in citation speech, in minimal pairs and in sentence-length utterances (/pa/-/a/ context: /il a di pa C_vC_ a lui/; /pas/-/s/ context: /il a di pas C_vC_ sa~ lui/). Prominent stress was on the C_vC_. Voicing-related differences in percentages of closure voicing, durations of aspiration, closure, and vowel were analyzed as a function of these three speaking styles. Results show that the salience of the acoustic-phonetic segments present when the syllables are uttered in isolation or in minimal pairs is different than when the syllables are spoken in a sentence. These results are in agreement with findings in English.

  12. Comparing Measures of Voice Quality from Sustained Phonation and Continuous Speech

    ERIC Educational Resources Information Center

    Gerratt, Bruce R.; Kreiman, Jody; Garellek, Marc

    2016-01-01

    Purpose: The question of what type of utterance--a sustained vowel or continuous speech--is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation.…

  13. Assessment of voice, speech, and related quality of life in advanced head and neck cancer patients 10-years+ after chemoradiotherapy.

    PubMed

    Kraaijenga, S A C; Oskam, I M; van Son, R J J H; Hamming-Vrieze, O; Hilgers, F J M; van den Brekel, M W M; van der Molen, L

    2016-04-01

    Assessment of long-term objective and subjective voice, speech, articulation, and quality of life in patients with head and neck cancer (HNC) treated with concurrent chemoradiotherapy (CRT) for advanced, stage IV disease. Twenty-two disease-free survivors, treated with cisplatin-based CRT for inoperable HNC (1999-2004), were evaluated at 10-years post-treatment. A standard Dutch text was recorded. Perceptual analysis of voice, speech, and articulation was conducted by two expert listeners (SLPs). Also an experimental expert system based on automatic speech recognition was used. Patients' perception of voice and speech and related quality of life was assessed with the Voice Handicap Index (VHI) and Speech Handicap Index (SHI) questionnaires. At a median follow-up of 11-years, perceptual evaluation showed abnormal scores in up to 64% of cases, depending on the outcome parameter analyzed. Automatic assessment of voice and speech parameters correlated moderate to strong with perceptual outcome scores. Patient-reported problems with voice (VHI>15) and speech (SHI>6) in daily life were present in 68% and 77% of patients, respectively. Patients treated with IMRT showed significantly less impairment compared to those treated with conventional radiotherapy. More than 10-years after organ-preservation treatment, voice and speech problems are common in this patient cohort, as assessed with perceptual evaluation, automatic speech recognition, and with validated structured questionnaires. There were fewer complaints in patients treated with IMRT than with conventional radiotherapy. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Detecting Abnormal Word Utterances in Children With Autism Spectrum Disorders: Machine-Learning-Based Voice Analysis Versus Speech Therapists.

    PubMed

    Nakai, Yasushi; Takiguchi, Tetsuya; Matsui, Gakuyo; Yamaoka, Noriko; Takada, Satoshi

    2017-10-01

    Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.

  15. Singing in groups for Parkinson's disease (SING-PD): a pilot study of group singing therapy for PD-related voice/speech disorders.

    PubMed

    Shih, Ludy C; Piel, Jordan; Warren, Amanda; Kraics, Lauren; Silver, Althea; Vanderhorst, Veronique; Simon, David K; Tarsy, Daniel

    2012-06-01

    Parkinson's disease related speech and voice impairment have significant impact on quality of life measures. LSVT(®)LOUD voice and speech therapy (Lee Silverman Voice Therapy) has demonstrated scientific efficacy and clinical effectiveness, but musically based voice and speech therapy has been underexplored as a potentially useful method of rehabilitation. We undertook a pilot, open-label study of a group-based singing intervention, consisting of twelve 90-min weekly sessions led by a voice and speech therapist/singing instructor. The primary outcome measure of vocal loudness as measured by sound pressure level (SPL) at 50 cm during connected speech was not significantly different one week after the intervention or at 13 weeks after the intervention. A number of secondary measures reflecting pitch range, phonation time and maximum loudness also were unchanged. Voice related quality of life (VRQOL) and voice handicap index (VHI) also were unchanged. This study suggests that a group singing therapy intervention at this intensity and frequency does not result in significant improvement in objective and subject-rated measures of voice and speech impairment. Copyright © 2012 Elsevier Ltd. All rights reserved.

  16. Predicting Voice Disorder Status From Smoothed Measures of Cepstral Peak Prominence Using Praat and Analysis of Dysphonia in Speech and Voice (ADSV).

    PubMed

    Sauder, Cara; Bretl, Michelle; Eadie, Tanya

    2017-09-01

    The purposes of this study were to (1) determine and compare the diagnostic accuracy of a single acoustic measure, smoothed cepstral peak prominence (CPPS), to predict voice disorder status from connected speech samples using two software systems: Analysis of Dysphonia in Speech and Voice (ADSV) and Praat; and (2) to determine the relationship between measures of CPPS generated from these programs. This is a retrospective cross-sectional study. Measures of CPPS were obtained from connected speech recordings of 100 subjects with voice disorders and 70 nondysphonic subjects without vocal complaints using commercially available ADSV and freely downloadable Praat software programs. Logistic regression and receiver operating characteristic (ROC) analyses were used to evaluate and compare the diagnostic accuracy of CPPS measures. Relationships between CPPS measures from the programs were determined. Results showed acceptable overall accuracy rates (75% accuracy, ADSV; 82% accuracy, Praat) and area under the ROC curves (area under the curve [AUC] = 0.81, ADSV; AUC = 0.91, Praat) for predicting voice disorder status, with slight differences in sensitivity and specificity. CPPS measures derived from Praat were uniquely predictive of disorder status above and beyond CPPS measures from ADSV (χ 2 (1) = 40.71, P < 0.001). CPPS measures from both programs were significantly and highly correlated (r = 0.88, P < 0.001). A single acoustic measure of CPPS was highly predictive of voice disorder status using either program. Clinicians may consider using CPPS to complement clinical voice evaluation and screening protocols. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  17. Brain 'talks over' boring quotes: top-down activation of voice-selective areas while listening to monotonous direct speech quotations.

    PubMed

    Yao, Bo; Belin, Pascal; Scheepers, Christoph

    2012-04-15

    In human communication, direct speech (e.g., Mary said, "I'm hungry") is perceived as more vivid than indirect speech (e.g., Mary said that she was hungry). This vividness distinction has previously been found to underlie silent reading of quotations: Using functional magnetic resonance imaging (fMRI), we found that direct speech elicited higher brain activity in the temporal voice areas (TVA) of the auditory cortex than indirect speech, consistent with an "inner voice" experience in reading direct speech. Here we show that listening to monotonously spoken direct versus indirect speech quotations also engenders differential TVA activity. This suggests that individuals engage in top-down simulations or imagery of enriched supra-segmental acoustic representations while listening to monotonous direct speech. The findings shed new light on the acoustic nature of the "inner voice" in understanding direct speech. Copyright © 2012 Elsevier Inc. All rights reserved.

  18. [Acoustic voice analysis using the Praat program: comparative study with the Dr. Speech program].

    PubMed

    Núñez Batalla, Faustino; González Márquez, Rocío; Peláez González, M Belén; González Laborda, Irene; Fernández Fernández, María; Morato Galán, Marta

    2014-01-01

    The European Laryngological Society (ELS) basic protocol for functional assessment of voice pathology includes 5 different approaches: perception, videostroboscopy, acoustics, aerodynamics and subjective rating by the patient. In this study we focused on acoustic voice analysis. The purpose of the present study was to correlate the results obtained by the commercial software Dr. Speech and the free software Praat in 2 fields: 1. Narrow-band spectrogram (the presence of noise according to Yanagihara, and the presence of subharmonics) (semi-quantitative). 2. Voice acoustic parameters (jitter, shimmer, harmonics-to-noise ratio, fundamental frequency) (quantitative). We studied a total of 99 voice samples from individuals with Reinke's oedema diagnosed using videostroboscopy. One independent observer used Dr. Speech 3.0 and a second one used the Praat program (Phonetic Sciences, University of Amsterdam). The spectrographic analysis consisted of obtaining a narrow-band spectrogram from the previous digitalised voice samples by the 2 independent observers. They then determined the presence of noise in the spectrogram, using the Yanagihara grades, as well as the presence of subharmonics. As a final result, the acoustic parameters of jitter, shimmer, harmonics-to-noise ratio and fundamental frequency were obtained from the 2 acoustic analysis programs. The results indicated that the sound spectrogram and the numerical values obtained for shimmer and jitter were similar for both computer programs, even though types 1, 2 and 3 voice samples were analysed. The Praat and Dr. Speech programs provide similar results in the acoustic analysis of pathological voices. Copyright © 2013 Elsevier España, S.L. All rights reserved.

  19. Accelerometer-based automatic voice onset detection in speech mapping with navigated repetitive transcranial magnetic stimulation.

    PubMed

    Vitikainen, Anne-Mari; Mäkelä, Elina; Lioumis, Pantelis; Jousmäki, Veikko; Mäkelä, Jyrki P

    2015-09-30

    The use of navigated repetitive transcranial magnetic stimulation (rTMS) in mapping of speech-related brain areas has recently shown to be useful in preoperative workflow of epilepsy and tumor patients. However, substantial inter- and intraobserver variability and non-optimal replicability of the rTMS results have been reported, and a need for additional development of the methodology is recognized. In TMS motor cortex mappings the evoked responses can be quantitatively monitored by electromyographic recordings; however, no such easily available setup exists for speech mappings. We present an accelerometer-based setup for detection of vocalization-related larynx vibrations combined with an automatic routine for voice onset detection for rTMS speech mapping applying naming. The results produced by the automatic routine were compared with the manually reviewed video-recordings. The new method was applied in the routine navigated rTMS speech mapping for 12 consecutive patients during preoperative workup for epilepsy or tumor surgery. The automatic routine correctly detected 96% of the voice onsets, resulting in 96% sensitivity and 71% specificity. Majority (63%) of the misdetections were related to visible throat movements, extra voices before the response, or delayed naming of the previous stimuli. The no-response errors were correctly detected in 88% of events. The proposed setup for automatic detection of voice onsets provides quantitative additional data for analysis of the rTMS-induced speech response modifications. The objectively defined speech response latencies increase the repeatability, reliability and stratification of the rTMS results. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Noise on, voicing off: Speech perception deficits in children with specific language impairment.

    PubMed

    Ziegler, Johannes C; Pech-Georgel, Catherine; George, Florence; Lorenzi, Christian

    2011-11-01

    Speech perception of four phonetic categories (voicing, place, manner, and nasality) was investigated in children with specific language impairment (SLI) (n=20) and age-matched controls (n=19) in quiet and various noise conditions using an AXB two-alternative forced-choice paradigm. Children with SLI exhibited robust speech perception deficits in silence, stationary noise, and amplitude-modulated noise. Comparable deficits were obtained for fast, intermediate, and slow modulation rates, and this speaks against the various temporal processing accounts of SLI. Children with SLI exhibited normal "masking release" effects (i.e., better performance in fluctuating noise than in stationary noise), again suggesting relatively spared spectral and temporal auditory resolution. In terms of phonetic categories, voicing was more affected than place, manner, or nasality. The specific nature of this voicing deficit is hard to explain with general processing impairments in attention or memory. Finally, speech perception in noise correlated with an oral language component but not with either a memory or IQ component, and it accounted for unique variance beyond IQ and low-level auditory perception. In sum, poor speech perception seems to be one of the primary deficits in children with SLI that might explain poor phonological development, impaired word production, and poor word comprehension. Copyright © 2011 Elsevier Inc. All rights reserved.

  1. [Comparative studies of the quality of the esophageal voice following laryngectomy: the insufflation test and reverse speech audiometry].

    PubMed

    Böhme, G; Clasen, B

    1989-09-01

    We carried out a transnasal insufflation test according to Blom and Singer on 27 laryngectomy patients as well as a speech communications test with the help of reverse speech audiometry, i.e. the post laryngectomy telephone test according to Zenner and Pfrang. The combined evaluation of both tests provided basic information on the quality of the esophagus voice and functionability of the speech organs. Both tests can be carried out quickly and easily and allow a differentiated statement to be made on the application possibilities of a esophagus voice, electronic speech aids and voice prothesis. Three groups could be identified from our results: 1. Insufflation test and reverse speech test provided conformable good or very good results. The esophagus voice was well understood. 2. Complete failure in the insufflation and telephone tests calls for further examinations to exclude any spasm, stricture, divertical and scarred membrane stenosis as well as tumor relapse in the region of the pharyngo-esophageal segments. 3. Organic causes must be looked for in the area of the nozzle as well as cranial nerve failure and social-determined causes in the case of normal insufflation and considerably reduced speech communication in the telephone test.

  2. Speech assessment of patients using three types of indwelling tracheo-oesophageal voice prostheses.

    PubMed

    Heaton, J M; Sanderson, D; Dunsmore, I R; Parker, A J

    1996-04-01

    A multidisciplinary prospective study compared speech acceptability between three types of indwelling tracheo-oesophageal voice prostheses. Twenty male laryngectomees took part over five years, using 42 prostheses. Speech was assessed on a discrete scale by trained and untrained personnel. The majority scored in the mid-range for each assessor. The kappa coefficient was used to test similarity between assessors, and for all pairings agreement was significant (p < 0.05). The speech and language therapist tended to give higher scores and the patient lower. A relationship was found between patients' ages categorized by decade and the surgeon's score alone. This relationship held for Groningen high resistance and Provox prostheses individually too (p < 0.05). The untrained assessed similarly to the professionals--humans are all voice listeners. The analysis suggests surgeons find tracheo-oesophageal speech in older patients better than in younger ones; or make more allowances for the elderly. There was a trend for Provox prostheses to produce the best scores.

  3. A comparison of recordings of sentences and spontaneous speech: perceptual and acoustic measures in preschool children's voices.

    PubMed

    McAllister, Anita; Brandt, Signe Kofoed

    2012-09-01

    A well-controlled recording in a studio is fundamental in most voice rehabilitation. However, this laboratory like recording method has been questioned because voice use in a natural environment may be quite different. In children's natural environment, high background noise levels are common and are an important factor contributing to voice problems. The primary noise source in day-care centers is the children themselves. The aim of the present study was to compare perceptual evaluations of voice quality and acoustic measures from a controlled recording with recordings of spontaneous speech in children's natural environment in a day-care setting. Eleven 5-year-old children were recorded three times during a day at the day care. The controlled speech material consisted of repeated sentences. Matching sentences were selected from the spontaneous speech. All sentences were repeated three times. Recordings were randomized and analyzed acoustically and perceptually. Statistic analyses showed that fundamental frequency was significantly higher in spontaneous speech (P<0.01) as was hyperfunction (P<0.001). The only characteristic the controlled sentences shared with spontaneous speech was degree of hoarseness (Spearman's rho=0.564). When data for boys and girls were analyzed separately, a correlation was found for the parameter breathiness (rho=0.551) for boys, and for girls the correlation for hoarseness remained (rho=0.752). Regarding acoustic data, none of the measures correlated across recording conditions for the whole group. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  4. Speech-Language Pathology production regarding voice in popular singing.

    PubMed

    Drumond, Lorena Badaró; Vieira, Naymme Barbosa; Oliveira, Domingos Sávio Ferreira de

    2011-12-01

    To present a literature review about the Brazilian scientific production in Speech-Language Pathology and Audiology regarding voice in popular singing in the last decade, as for number of publications, musical styles studied, focus of the researches, and instruments used for data collection. Cross-sectional descriptive study carried out in two stages: search in databases and publications encompassing the last decade of researches in this area in Brazil, and reading of the material obtained for posterior categorization. The databases LILACS and SciELO, the Databasis of Dissertations and Theses organized by CAPES, the online version of Acta ORL, and the online version of OPUS were searched, using the following uniterms: voice, professional voice, singing voice, dysphonia, voice disorders, voice training, music, dysodia. Articles published between the years 2000 and 2010 were selected. The researches found were classified and categorized after reading their abstracts and, when necessary, the whole study. Twenty researches within the proposed theme were selected, all of which were descriptive, involving several musical styles. Twelve studies focused on the evaluation of the popular singer's voice, and the most frequently used data collection instrument was the auditory-perceptual evaluation. The results of the publications found corroborate the objectives proposed by the authors and the different methodologies. The number of studies published is still restricted when compared to the diversity of musical genres and the uniqueness of popular singer.

  5. Noise on, Voicing off: Speech Perception Deficits in Children with Specific Language Impairment

    ERIC Educational Resources Information Center

    Ziegler, Johannes C.; Pech-Georgel, Catherine; George, Florence; Lorenzi, Christian

    2011-01-01

    Speech perception of four phonetic categories (voicing, place, manner, and nasality) was investigated in children with specific language impairment (SLI) (n=20) and age-matched controls (n=19) in quiet and various noise conditions using an AXB two-alternative forced-choice paradigm. Children with SLI exhibited robust speech perception deficits in…

  6. The Prevalence of Stuttering, Voice, and Speech-Sound Disorders in Primary School Students in Australia

    ERIC Educational Resources Information Center

    McKinnon, David H.; McLeod, Sharynne; Reilly, Sheena

    2007-01-01

    Purpose: The aims of this study were threefold: to report teachers' estimates of the prevalence of speech disorders (specifically, stuttering, voice, and speech-sound disorders); to consider correspondence between the prevalence of speech disorders and gender, grade level, and socioeconomic status; and to describe the level of support provided to…

  7. The prevalence of stuttering, voice, and speech-sound disorders in primary school students in Australia.

    PubMed

    McKinnon, David H; McLeod, Sharynne; Reilly, Sheena

    2007-01-01

    The aims of this study were threefold: to report teachers' estimates of the prevalence of speech disorders (specifically, stuttering, voice, and speech-sound disorders); to consider correspondence between the prevalence of speech disorders and gender, grade level, and socioeconomic status; and to describe the level of support provided to schoolchildren with speech disorders. Students with speech disorders were identified from 10,425 students in Australia using a 4-stage process: training in the data collection process, teacher identification, confirmation by a speech-language pathologist, and consultation with district special needs advisors. The prevalence of students with speech disorders was estimated; specifically, 0.33% of students were identified as stuttering, 0.12% as having a voice disorder, and 1.06% as having a speech-sound disorder. There was a higher prevalence of speech disorders in males than in females. As grade level increased, the prevalence of speech disorders decreased. There was no significant difference in the pattern of prevalence across the three speech disorders and four socioeconomic groups; however, students who were identified with a speech disorder were more likely to be in the higher socioeconomic groups. Finally, there was a difference between the perceived and actual level of support that was provided to these students. These prevalence figures are lower than those using initial identification by speech-language pathologists and similar to those using parent report.

  8. Vocal effectiveness of speech-language pathology students: Before and after voice use during service delivery.

    PubMed

    Couch, Stephanie; Zieba, Dominique; Van der Linde, Jeannie; Van der Merwe, Anita

    2015-03-26

    As a professional voice user, it is imperative that a speech-language pathologist's(SLP) vocal effectiveness remain consistent throughout the day. Many factors may contribute to reduced vocal effectiveness, including prolonged voice use, vocally abusive behaviours,poor vocal hygiene and environmental factors. To determine the effect of service delivery on the perceptual and acoustic features of voice. A quasi-experimental., pre-test-post-test research design was used. Participants included third- and final-year speech-language pathology students at the University of Pretoria(South Africa). Voice parameters were evaluated in a pre-test measurement, after which the participants provided two consecutive hours of therapy. A post-test measurement was then completed. Data analysis consisted of an instrumental analysis in which the multidimensional voice programme (MDVP) and the voice range profile (VRP) were used to measure vocal parameters and then calculate the dysphonia severity index (DSI). The GRBASI scale was used to conduct a perceptual analysis of voice quality. Data were processed using descriptive statistics to determine change in each measured parameter after service delivery. A change of clinical significance was observed in the acoustic and perceptual parameters of voice. Guidelines for SLPs in order to maintain optimal vocal effectiveness were suggested.

  9. Vocal effectiveness of speech-language pathology students: Before and after voice use during service delivery

    PubMed Central

    Couch, Stephanie; Zieba, Dominique; van der Merwe, Anita

    2015-01-01

    Background As a professional voice user, it is imperative that a speech-language pathologist's (SLP) vocal effectiveness remain consistent throughout the day. Many factors may contribute to reduced vocal effectiveness, including prolonged voice use, vocally abusive behaviours, poor vocal hygiene and environmental factors. Objectives To determine the effect of service delivery on the perceptual and acoustic features of voice. Method A quasi-experimental., pre-test–post-test research design was used. Participants included third- and final-year speech-language pathology students at the University of Pretoria (South Africa). Voice parameters were evaluated in a pre-test measurement, after which the participants provided two consecutive hours of therapy. A post-test measurement was then completed. Data analysis consisted of an instrumental analysis in which the multidimensional voice programme (MDVP) and the voice range profile (VRP) were used to measure vocal parameters and then calculate the dysphonia severity index (DSI). The GRBASI scale was used to conduct a perceptual analysis of voice quality. Data were processed using descriptive statistics to determine change in each measured parameter after service delivery. Results A change of clinical significance was observed in the acoustic and perceptual parameters of voice. Conclusion Guidelines for SLPs in order to maintain optimal vocal effectiveness were suggested. PMID:26304213

  10. Thermal welding vs. cold knife tonsillectomy: a comparison of voice and speech.

    PubMed

    Celebi, Saban; Yelken, Kursat; Celik, Oner; Taskin, Umit; Topak, Murat

    2011-01-01

    To compare acoustic, aerodynamic and perceptual voice and speech parameters in thermal welding system tonsillectomy and cold knife tonsillectomy patients in order to determine the impact of operation technique on voice and speech. Thirty tonsillectomy patients (22 children, 8 adults) participated in this study. The preferred technique was cold knife tonsillectomy in 15 patients and thermal welding system tonsillectomy in the remaining 15 patients. One week before and 1 month after surgery the following parameters were estimated: average of fundamental frequency, Jitter, Shimmer, harmonic to noise ratio, formant frequency analyses of sustained vowels. Perceptual speech analysis and aerodynamic measurements (maximum phonation time and s/z ratio) were also conducted. There was no significant difference in any of the parameters between cold knife tonsillectomy and thermal welding system tonsillectomy groups (p>0.05). When the groups were contrasted among themselves with regards to preoperative and postoperative rates, fundamental frequency was found to be significantly decreased after tonsillectomy in both of the groups (p<0.001). First formant for the vowel /a/ in the cold knife tonsillectomy group and for the vowel /i/ in the thermal welding system tonsillectomy group, second formant for the vowel /u/ in the thermal welding system tonsillectomy group and third formant for the vowel /u/ in the cold knife tonsillectomy group were found to be significantly decreased (p<0.05). The surgical technique, whether it is cold knife or thermal welding system, does not appear to affect voice and speech in tonsillectomy patients. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  11. The persuasiveness of synthetic speech versus human speech.

    PubMed

    Stern, S E; Mullennix, J W; Dyson, C; Wilson, S J

    1999-12-01

    Is computer-synthesized speech as persuasive as the human voice when presenting an argument? After completing an attitude pretest, 193 participants were randomly assigned to listen to a persuasive appeal under three conditions: a high-quality synthesized speech system (DECtalk Express), a low-quality synthesized speech system (Monologue), and a tape recording of a human voice. Following the appeal, participants completed a posttest attitude survey and a series of questionnaires designed to assess perceptions of speech qualities, perceptions of the speaker, and perceptions of the message. The human voice was generally perceived more favorably than the computer-synthesized voice, and the speaker was perceived more favorably when the voice was a human voice than when it was computer synthesized. There was, however, no evidence that computerized speech, as compared with the human voice, affected persuasion or perceptions of the message. Actual or potential applications of this research include issues that should be considered when designing synthetic speech systems.

  12. It doesn't matter what you say: FMRI correlates of voice learning and recognition independent of speech content.

    PubMed

    Zäske, Romi; Awwad Shiekh Hasan, Bashar; Belin, Pascal

    2017-09-01

    Listeners can recognize newly learned voices from previously unheard utterances, suggesting the acquisition of high-level speech-invariant voice representations during learning. Using functional magnetic resonance imaging (fMRI) we investigated the anatomical basis underlying the acquisition of voice representations for unfamiliar speakers independent of speech, and their subsequent recognition among novel voices. Specifically, listeners studied voices of unfamiliar speakers uttering short sentences and subsequently classified studied and novel voices as "old" or "new" in a recognition test. To investigate "pure" voice learning, i.e., independent of sentence meaning, we presented German sentence stimuli to non-German speaking listeners. To disentangle stimulus-invariant and stimulus-dependent learning, during the test phase we contrasted a "same sentence" condition in which listeners heard speakers repeating the sentences from the preceding study phase, with a "different sentence" condition. Voice recognition performance was above chance in both conditions although, as expected, performance was higher for same than for different sentences. During study phases activity in the left inferior frontal gyrus (IFG) was related to subsequent voice recognition performance and same versus different sentence condition, suggesting an involvement of the left IFG in the interactive processing of speaker and speech information during learning. Importantly, at test reduced activation for voices correctly classified as "old" compared to "new" emerged in a network of brain areas including temporal voice areas (TVAs) of the right posterior superior temporal gyrus (pSTG), as well as the right inferior/middle frontal gyrus (IFG/MFG), the right medial frontal gyrus, and the left caudate. This effect of voice novelty did not interact with sentence condition, suggesting a role of temporal voice-selective areas and extra-temporal areas in the explicit recognition of learned voice identity

  13. Patient-reported voice and speech outcomes after whole-neck intensity modulated radiation therapy and chemotherapy for oropharyngeal cancer: prospective longitudinal study.

    PubMed

    Vainshtein, Jeffrey M; Griffith, Kent A; Feng, Felix Y; Vineberg, Karen A; Chepeha, Douglas B; Eisbruch, Avraham

    2014-08-01

    To describe voice and speech quality changes and their predictors in patients with locally advanced oropharyngeal cancer treated on prospective clinical studies of organ-preserving chemotherapy-intensity modulated radiation therapy (chemo-IMRT). Ninety-one patients with stage III/IV oropharyngeal cancer were treated on 2 consecutive prospective studies of definitive chemoradiation using whole-field IMRT from 2003 to 2011. Patient-reported voice and speech quality were longitudinally assessed from before treatment through 24 months using the Communication Domain of the Head and Neck Quality of Life (HNQOL-C) instrument and the Speech question of the University of Washington Quality of Life (UWQOL-S) instrument, respectively. Factors associated with patient-reported voice quality worsening from baseline and speech impairment were assessed. Voice quality decreased maximally at 1 month, with 68% and 41% of patients reporting worse HNQOL-C and UWQOL-S scores compared with before treatment, and improved thereafter, recovering to baseline by 12-18 months on average. In contrast, observer-rated larynx toxicity was rare (7% at 3 months; 5% at 6 months). Among patients with mean glottic larynx (GL) dose ≤20 Gy, >20-30 Gy, >30-40 Gy, >40-50 Gy, and >50 Gy, 10%, 32%, 25%, 30%, and 63%, respectively, reported worse voice quality at 12 months compared with before treatment (P=.011). Results for speech impairment were similar. Glottic larynx dose, N stage, neck dissection, oral cavity dose, and time since chemo-IMRT were univariately associated with either voice worsening or speech impairment. On multivariate analysis, mean GL dose remained independently predictive for both voice quality worsening (8.1%/Gy) and speech impairment (4.3%/Gy). Voice quality worsening and speech impairment after chemo-IMRT for locally advanced oropharyngeal cancer were frequently reported by patients, underrecognized by clinicians, and independently associated with GL dose. These findings support

  14. Patient-Reported Voice and Speech Outcomes After Whole-Neck Intensity Modulated Radiation Therapy and Chemotherapy for Oropharyngeal Cancer: Prospective Longitudinal Study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vainshtein, Jeffrey M.; Griffith, Kent A.; Feng, Felix Y.

    Purpose: To describe voice and speech quality changes and their predictors in patients with locally advanced oropharyngeal cancer treated on prospective clinical studies of organ-preserving chemotherapy–intensity modulated radiation therapy (chemo-IMRT). Methods and Materials: Ninety-one patients with stage III/IV oropharyngeal cancer were treated on 2 consecutive prospective studies of definitive chemoradiation using whole-field IMRT from 2003 to 2011. Patient-reported voice and speech quality were longitudinally assessed from before treatment through 24 months using the Communication Domain of the Head and Neck Quality of Life (HNQOL-C) instrument and the Speech question of the University of Washington Quality of Life (UWQOL-S) instrument, respectively.more » Factors associated with patient-reported voice quality worsening from baseline and speech impairment were assessed. Results: Voice quality decreased maximally at 1 month, with 68% and 41% of patients reporting worse HNQOL-C and UWQOL-S scores compared with before treatment, and improved thereafter, recovering to baseline by 12-18 months on average. In contrast, observer-rated larynx toxicity was rare (7% at 3 months; 5% at 6 months). Among patients with mean glottic larynx (GL) dose ≤20 Gy, >20-30 Gy, >30-40 Gy, >40-50 Gy, and >50 Gy, 10%, 32%, 25%, 30%, and 63%, respectively, reported worse voice quality at 12 months compared with before treatment (P=.011). Results for speech impairment were similar. Glottic larynx dose, N stage, neck dissection, oral cavity dose, and time since chemo-IMRT were univariately associated with either voice worsening or speech impairment. On multivariate analysis, mean GL dose remained independently predictive for both voice quality worsening (8.1%/Gy) and speech impairment (4.3%/Gy). Conclusions: Voice quality worsening and speech impairment after chemo-IMRT for locally advanced oropharyngeal cancer were frequently reported by patients, underrecognized by clinicians, and

  15. Voice and Fluency Changes as a Function of Speech Task and Deep Brain Stimulation

    ERIC Educational Resources Information Center

    Van Lancker Sidtis, Diana; Rogers, Tiffany; Godier, Violette; Tagliati, Michele; Sidtis, John J.

    2010-01-01

    Purpose: Speaking, which naturally occurs in different modes or "tasks" such as conversation and repetition, relies on intact basal ganglia nuclei. Recent studies suggest that voice and fluency parameters are differentially affected by speech task. In this study, the authors examine the effects of subcortical functionality on voice and fluency,…

  16. Provision of surgical voice restoration in England: questionnaire survey of speech and language therapists.

    PubMed

    Bradley, P J; Counter, P; Hurren, A; Cocks, H C

    2013-08-01

    To conduct a questionnaire survey of speech and language therapists providing and managing surgical voice restoration in England. National Health Service Trusts registering more than 10 new laryngeal cancer patients during any one year, from November 2009 to October 2010, were identified, and a list of speech and language therapists compiled. A questionnaire was developed, peer reviewed and revised. The final questionnaire was e-mailed with a covering letter to 82 units. Eighty-two questionnaires were distributed and 72 were returned and analysed, giving a response rate of 87.8 per cent. Forty-four per cent (38/59) of the units performed more than 10 laryngectomies per year. An in-hours surgical voice restoration service was provided by speech and language therapists in 45.8 per cent (33/72) and assisted by nurses in 34.7 per cent (25/72). An out of hours service was provided directly by ENT staff in 35.5 per cent (21/59). Eighty-eight per cent (63/72) of units reported less than 10 (emergency) out of hours calls per month. Surgical voice restoration service provision varies within and between cancer networks. There is a need for a national management and care protocol, an educational programme for out of hours service providers, and a review of current speech and language therapist staffing levels in England.

  17. Clinical Characteristics of Voice, Speech, and Swallowing Disorders in Oromandibular Dystonia

    ERIC Educational Resources Information Center

    Kreisler, Alexandre; Vepraet, Anne Caroline; Veit, Solène; Pennel-Ployart, Odile; Béhal, Hélène; Duhamel, Alain; Destée, Alain

    2016-01-01

    Purpose: To better define the clinical characteristics of idiopathic oromandibular dystonia, we studied voice, speech, and swallowing disorders and their impact on activities of daily living. Method: Fourteen consecutive patients with idiopathic oromandibular dystonia and 14 matched, healthy control subjects were included in the study. Results:…

  18. Pre- and posttreatment voice and speech outcomes in patients with advanced head and neck cancer treated with chemoradiotherapy: expert listeners' and patient's perception.

    PubMed

    van der Molen, Lisette; van Rossum, Maya A; Jacobi, Irene; van Son, Rob J J H; Smeele, Ludi E; Rasch, Coen R N; Hilgers, Frans J M

    2012-09-01

    Perceptual judgments and patients' perception of voice and speech after concurrent chemoradiotherapy (CCRT) for advanced head and neck cancer. Prospective clinical trial. A standard Dutch text and a diadochokinetic task were recorded. Expert listeners rated voice and speech quality (based on Grade, Roughness, Breathiness, Asthenia, and Strain), articulation (overall, [p], [t], [k]), and comparative mean opinion scores of voice and speech at three assessment points calculated. A structured study-specific questionnaire evaluated patients' perception pretreatment (N=55), at 10-week (N=49) and 1-year posttreatment (N=37). At 10 weeks, perceptual voice quality is significantly affected. The parameters overall voice quality (mean, -0.24; P=0.008), strain (mean, -0.12; P=0.012), nasality (mean, -0.08; P=0.009), roughness (mean, -0.22; P=0.001), and pitch (mean, -0.03; P=0.041) improved over time but not beyond baseline levels, except for asthenia at 1-year posttreatment (voice is less asthenic than at baseline; mean, +0.20; P=0.03). Perceptual analyses of articulation showed no significant differences. Patients judge their voice quality as good (score, 18/20) at all assessment points, but at 1-year posttreatment, most of them (70%) judge their "voice not as it used to be." In the 1-year versus 10-week posttreatment comparison, the larynx-hypopharynx tumor group was more strained, whereas nonlarynx tumor voices were judged less strained (mean, -0.33 and +0.07, respectively; P=0.031). Patients' perceived changes in voice and speech quality at 10-week post- versus pretreatment correlate weakly with expert judgments. Overall, perceptual CCRT effects on voice and speech seem to peak at 10-week posttreatment but level off at 1-year posttreatment. However, at that assessment point, most patients still perceive their voice as different from baseline. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  19. [Voice assessment and demographic data of applicants for a school of speech therapists].

    PubMed

    Reiter, R; Brosch, S

    2008-05-01

    Demographic data, subjective und objective voice analysis as well as self-assessment of voice quality from applicants for a school of speech therapists were investigated. Demographic data from 116 applicants were collected and their voice quality assessed by three independent judges. An objective evaluation was done by maximum phonation time, average fundamental frequency, dynamic range and percent of jitter and shimmer by means of Goettinger Hoarseness diagram. Self-assessment of voice quality was done by "voice handicap index questionnaire". The twenty successful applicants had a physiological voice in 95 %, they were all musical and had university entrance qualifications. Subjective voice assessment showed in 16 % of the applicants a hoarse voice. In this subgroup an unphysiological vocal use was observed in 72 % and a reduced articulation in 45 %. The objective voice parameters did not show a significant difference between the 3 groups. Self-assessment of the voice was inconspicuous in all applicants. Applicants with general qualification for university entrance, musicality and a physiological voice were more likely to be successful. There were main differences between self assessment of voice and quantitative analysis or subjective assessment by three independent judges.

  20. The Effect of Anchors and Training on the Reliability of Voice Quality Ratings for Different Types of Speech Stimuli.

    PubMed

    Brinca, Lilia; Batista, Ana Paula; Tavares, Ana Inês; Pinto, Patrícia N; Araújo, Lara

    2015-11-01

    The main objective of the present study was to investigate if the type of voice stimuli-sustained vowel, oral reading, and connected speech-results in good intrarater and interrater agreement/reliability. A short-term panel study was performed. Voice samples from 30 native European Portuguese speakers were used in the present study. The speech materials used were (1) the sustained vowel /a/, (2) oral reading of the European Portuguese version of "The Story of Arthur the Rat," and (3) connected speech. After an extensive training with textual and auditory anchors, the judges were asked to rate the severity of dysphonic voice stimuli using the phonation dimensions G, R, and B from the GRBAS scale. The voice samples were judged 6 months and 1 year after the training. Intrarater agreement and reliability were generally very good for all the phonation dimensions and voice stimuli. The highest interrater reliability was obtained using the oral reading stimulus, particularly for phonation dimensions grade (G) and breathiness (B). Roughness (R) was the voice quality that was the most difficult to evaluate, leading to interrater unreliability in all voice quality ratings. Extensive training using textual and auditory anchors and the use of anchors during the voice evaluations appear to be good methods for auditory-perceptual evaluation of dysphonic voices. The best results of interrater reliability were obtained when the oral reading stimulus was used. Breathiness appears to be a voice quality that is easier to evaluate than roughness. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  1. Can a computer-generated voice be sincere? A case study combining music and synthetic speech.

    PubMed

    Barker, Paul; Newell, Christopher; Newell, George

    2013-10-01

    This article explores enhancing sincerity, honesty, or truthfulness in computer-generated synthetic speech by accompanying it with music. Sincerity is important if we are to respond positively to any voice, whether human or artificial. What is sincerity in the artificial disembodied voice? Studies in musical expression and performance may illuminate aspects of the 'musically spoken' or sung voice in rendering deeper levels of expression that may include sincerity. We consider one response to this notion in an especially composed melodrama (music accompanying a (synthetic) spoken voice) designed to convey sincerity.

  2. A study of voice production characteristics of astronuat speech during Apollo 11 for speaker modeling in space.

    PubMed

    Yu, Chengzhu; Hansen, John H L

    2017-03-01

    Human physiology has evolved to accommodate environmental conditions, including temperature, pressure, and air chemistry unique to Earth. However, the environment in space varies significantly compared to that on Earth and, therefore, variability is expected in astronauts' speech production mechanism. In this study, the variations of astronaut voice characteristics during the NASA Apollo 11 mission are analyzed. Specifically, acoustical features such as fundamental frequency and phoneme formant structure that are closely related to the speech production system are studied. For a further understanding of astronauts' vocal tract spectrum variation in space, a maximum likelihood frequency warping based analysis is proposed to detect the vocal tract spectrum displacement during space conditions. The results from fundamental frequency, formant structure, as well as vocal spectrum displacement indicate that astronauts change their speech production mechanism when in space. Moreover, the experimental results for astronaut voice identification tasks indicate that current speaker recognition solutions are highly vulnerable to astronaut voice production variations in space conditions. Future recommendations from this study suggest that successful applications of speaker recognition during extended space missions require robust speaker modeling techniques that could effectively adapt to voice production variation caused by diverse space conditions.

  3. Feasibility of automated speech sample collection with stuttering children using interactive voice response (IVR) technology.

    PubMed

    Vogel, Adam P; Block, Susan; Kefalianos, Elaina; Onslow, Mark; Eadie, Patricia; Barth, Ben; Conway, Laura; Mundt, James C; Reilly, Sheena

    2015-04-01

    To investigate the feasibility of adopting automated interactive voice response (IVR) technology for remotely capturing standardized speech samples from stuttering children. Participants were 10 6-year-old stuttering children. Their parents called a toll-free number from their homes and were prompted to elicit speech from their children using a standard protocol involving conversation, picture description and games. The automated IVR system was implemented using an off-the-shelf telephony software program and delivered by a standard desktop computer. The software infrastructure utilizes voice over internet protocol. Speech samples were automatically recorded during the calls. Video recordings were simultaneously acquired in the home at the time of the call to evaluate the fidelity of the telephone collected samples. Key outcome measures included syllables spoken, percentage of syllables stuttered and an overall rating of stuttering severity using a 10-point scale. Data revealed a high level of relative reliability in terms of intra-class correlation between the video and telephone acquired samples on all outcome measures during the conversation task. Findings were less consistent for speech samples during picture description and games. Results suggest that IVR technology can be used successfully to automate remote capture of child speech samples.

  4. Speech, Voice, and Communication.

    PubMed

    Johnson, Julia A

    2017-01-01

    Communication changes are an important feature of Parkinson's and include both motor and nonmotor features. This chapter will cover briefly the motor features affecting speech production and voice function before focusing on the nonmotor aspects. A description of the difficulties experienced by people with Parkinson's when trying to communicate effectively is presented along with some of the assessment tools and therapists' treatment options. The idea of clinical heterogeneity of PD and subtyping patients with different communication problems is explored and suggestions are made on how this may influence clinicians' treatment methods and choices so as to provide personalized therapy programmes. The importance of encouraging and supporting people to maintain social networks, employment, and leisure activities is stated as the key to achieving sustainability. Finally looking into the future, the emergence of new technologies is seen as providing further possibilities to support therapists in the goal of helping people with Parkinson's to maintain good communication skills throughout the course of the disease. © 2017 Elsevier Inc. All rights reserved.

  5. Multitalker Speech Perception with Ideal Time-Frequency Segregation: Effects of Voice Characteristics and Number of Talkers

    DTIC Science & Technology

    2009-03-23

    Multitalker speech perception with ideal time-frequency segregation: Effects of voice characteristics and number of talkers Douglas S. Brungarta Air...INTRODUCTION Speech perception in multitalker listening environments is limited by two very different types of masking. The first is energetic...06 MAR 2009 2. REPORT TYPE 3. DATES COVERED 00-00-2009 to 00-00-2009 4. TITLE AND SUBTITLE Multitalker speech perception with ideal time

  6. Intra- and Inter-database Study for Arabic, English, and German Databases: Do Conventional Speech Features Detect Voice Pathology?

    PubMed

    Ali, Zulfiqar; Alsulaiman, Mansour; Muhammad, Ghulam; Elamvazuthi, Irraivan; Al-Nasheri, Ahmed; Mesallam, Tamer A; Farahat, Mohamed; Malki, Khalid H

    2017-05-01

    A large population around the world has voice complications. Various approaches for subjective and objective evaluations have been suggested in the literature. The subjective approach strongly depends on the experience and area of expertise of a clinician, and human error cannot be neglected. On the other hand, the objective or automatic approach is noninvasive. Automatic developed systems can provide complementary information that may be helpful for a clinician in the early screening of a voice disorder. At the same time, automatic systems can be deployed in remote areas where a general practitioner can use them and may refer the patient to a specialist to avoid complications that may be life threatening. Many automatic systems for disorder detection have been developed by applying different types of conventional speech features such as the linear prediction coefficients, linear prediction cepstral coefficients, and Mel-frequency cepstral coefficients (MFCCs). This study aims to ascertain whether conventional speech features detect voice pathology reliably, and whether they can be correlated with voice quality. To investigate this, an automatic detection system based on MFCC was developed, and three different voice disorder databases were used in this study. The experimental results suggest that the accuracy of the MFCC-based system varies from database to database. The detection rate for the intra-database ranges from 72% to 95%, and that for the inter-database is from 47% to 82%. The results conclude that conventional speech features are not correlated with voice, and hence are not reliable in pathology detection. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  7. Age-related changes to spectral voice characteristics affect judgments of prosodic, segmental, and talker attributes for child and adult speech.

    PubMed

    Dilley, Laura C; Wieland, Elizabeth A; Gamache, Jessica L; McAuley, J Devin; Redford, Melissa A

    2013-02-01

    As children mature, changes in voice spectral characteristics co-vary with changes in speech, language, and behavior. In this study, spectral characteristics were manipulated to alter the perceived ages of talkers' voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Speech was modified by lowering formants and fundamental frequency, for 5-year-old children's utterances, or raising them, for adult caregivers' utterances. Next, participants differing in awareness of the manipulation (Experiment 1A) or amount of speech-language training (Experiment 1B) made judgments of prosodic, segmental, and talker attributes. Experiment 2 investigated the effects of spectral modification on intelligibility. Finally, in Experiment 3, trained analysts used formal prosody coding to assess prosodic characteristics of spectrally modified and unmodified speech. Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work.

  8. Age-related changes to spectral voice characteristics affect judgments of prosodic, segmental, and talker attributes for child and adult speech

    PubMed Central

    Dilley, Laura C.; Wieland, Elizabeth A.; Gamache, Jessica L.; McAuley, J. Devin; Redford, Melissa A.

    2013-01-01

    Purpose As children mature, changes in voice spectral characteristics covary with changes in speech, language, and behavior. Spectral characteristics were manipulated to alter the perceived ages of talkers’ voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Method Speech was modified by lowering formants and fundamental frequency, for 5-year-old children’s utterances, or raising them, for adult caregivers’ utterances. Next, participants differing in awareness of the manipulation (Exp. 1a) or amount of speech-language training (Exp. 1b) made judgments of prosodic, segmental, and talker attributes. Exp. 2 investigated the effects of spectral modification on intelligibility. Finally, in Exp. 3 trained analysts used formal prosody coding to assess prosodic characteristics of spectrally-modified and unmodified speech. Results Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Conclusions Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work. PMID:23275414

  9. Speech-language pathology students' self-reports on voice training: easier to understand or to do?

    PubMed

    Lindhe, Christina; Hartelius, Lena

    2009-01-01

    The aim of the study was to describe the subjective ratings of the course 'Training of the student's own voice and speech', from a student-centred perspective. A questionnaire was completed after each of the six individual sessions. Six speech and language pathology (SLP) students rated how they perceived the practical exercises in terms of doing and understanding. The results showed that five of the six participants rated the exercises as significantly easier to understand than to do. The exercises were also rated as easier to do over time. Results are interpreted within in a theoretical framework of approaches to learning. The findings support the importance of both the physical and reflective aspects of the voice training process.

  10. [Voice and vibration sensations in the speech forming organs: clinical and theoretical aspects of rare symptoms specific for schizophrenia].

    PubMed

    Vilela, W; Lolas, F; Wolpert, E

    1978-01-01

    When studying 750 psychiatric in-patients with psychoses of various diagnostic groups, the symptoms of voice sensations and vibration feelings could only be found among patients with paranoid schizophrenia. In addition, these symptoms were located exclusively in body areas that are involved in the peripheral motor production of voice and speech (areas of head, throat, thorax). In 11 of 15 such cases that could be identified, the sensations of voices and vibrations occurred simultaneously and in identical body parts; in the remaining 4 cases only voices without vibration sensations were reported. Therefore these symptoms can be considered as highly specific for schizophrenia. According to the terminology of Bleuler these two symptoms are because of their rareness to be taken as accessoric symptoms; according to the terminology of Kurt Schneider they have the value of first rank symptoms because of their highly diagnostic specifity for schizophrenia. The pathogenesis of these symptoms is on the one hand discussed under the perspective of language development and the changing function of language for behaviour control; on the other hand, the pathogenesis of these symptoms is discussed from the viewpoint of cybernetic, or neurophysiological-neuroanatomical foundation of speech production and speech control. Both models of explanation have in common that the ideational component of speech is noticed as acustic halluzinations and the motor proprioceptive part of speech is noticed as sensation of vibrations, both in a typically schiphrenic manner, i.e. dissociated and ego-alienated.

  11. Functional overlap between regions involved in speech perception and in monitoring one's own voice during speech production.

    PubMed

    Zheng, Zane Z; Munhall, Kevin G; Johnsrude, Ingrid S

    2010-08-01

    The fluency and the reliability of speech production suggest a mechanism that links motor commands and sensory feedback. Here, we examined the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not and by examining the overlap with the network recruited during passive listening to speech sounds. We used real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word ("Ted") and either heard this clearly or heard voice-gated masking noise. We compared this to when they listened to yoked stimuli (identical recordings of "Ted" or noise) without speaking. Activity along the STS and superior temporal gyrus bilaterally was significantly greater if the auditory stimulus was (a) processed as the auditory concomitant of speaking and (b) did not match the predicted outcome (noise). The network exhibiting this Feedback Type x Production/Perception interaction includes a superior temporal gyrus/middle temporal gyrus region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts and that processes an error signal in speech-sensitive regions when this and the sensory data do not match.

  12. Bilingual Voicing: A Study of Code-Switching in the Reported Speech of Finnish Immigrants in Estonia

    ERIC Educational Resources Information Center

    Frick, Maria; Riionheimo, Helka

    2013-01-01

    Through a conversation analytic investigation of Finnish-Estonian bilingual (direct) reported speech (i.e., voicing) by Finns who live in Estonia, this study shows how code-switching is used as a double contextualization device. The code-switched voicings are shaped by the on-going interactional situation, serving its needs by opening up a context…

  13. Recognition of voice commands using adaptation of foreign language speech recognizer via selection of phonetic transcriptions

    NASA Astrophysics Data System (ADS)

    Maskeliunas, Rytis; Rudzionis, Vytautas

    2011-06-01

    In recent years various commercial speech recognizers have become available. These recognizers provide the possibility to develop applications incorporating various speech recognition techniques easily and quickly. All of these commercial recognizers are typically targeted to widely spoken languages having large market potential; however, it may be possible to adapt available commercial recognizers for use in environments where less widely spoken languages are used. Since most commercial recognition engines are closed systems the single avenue for the adaptation is to try set ways for the selection of proper phonetic transcription methods between the two languages. This paper deals with the methods to find the phonetic transcriptions for Lithuanian voice commands to be recognized using English speech engines. The experimental evaluation showed that it is possible to find phonetic transcriptions that will enable the recognition of Lithuanian voice commands with recognition accuracy of over 90%.

  14. Collaboration and conquest: MTD as viewed by voice teacher (singing voice specialist) and speech-language pathologist.

    PubMed

    Goffi-Fynn, Jeanne C; Carroll, Linda M

    2013-05-01

    This study was designed as a qualitative case study to demonstrate the process of diagnosis and treatment between a voice team to manage a singer diagnosed with muscular tension dysphonia (MTD). Traditionally, literature suggests that MTD is challenging to treat and little in the literature directly addresses singers with MTD. Data collected included initial medical screening with laryngologist, referral to speech-language pathologist (SLP) specializing in voice disorders among singers, and adjunctive voice training with voice teacher trained in vocology (singing voice specialist or SVS). Initial target goals with SLP included reducing extrinsic laryngeal tension, using a relaxed laryngeal posture, and effective abdominal-diaphragmatic support for all phonation events. Balance of respiratory forces, laryngeal coordination, and use of optimum filtering of the source signal through resonance and articulatory awareness was emphasized. Further work with SVS included three main goals including a lowered breathing pattern to aid in decreasing subglottic air pressure, vertical laryngeal position to lower to allow for a relaxed laryngeal position, and a top-down singing approach to encourage an easier, more balanced registration, and better resonance. Initial results also emphasize the retraining of subject toward a sensory rather than auditory mode of monitoring. Other areas of consideration include singers' training and vocal use, the psychological effects of MTD, the personalities potentially associated with it, and its relationship with stress. Finally, the results emphasize that a positive rapport with the subject and collaboration between all professionals involved in a singer's care are essential for recovery. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  15. Voice Outcomes of Adults Diagnosed with Pediatric Vocal Fold Nodules and Impact of Speech Therapy.

    PubMed

    Song, Brian H; Merchant, Maqdooda; Schloegel, Luke

    2017-11-01

    Objective To evaluate the voice outcomes of adults diagnosed with vocal fold nodules (VFNs) as children and to assess the impact of speech therapy on long-term voice outcomes. Study Design Prospective cohort study. Setting Large health care system. Subjects and Methods Subjects diagnosed with VFNs as children between the years 1996 and 2008 were identified within a medical record database of a large health care system. Included subjects were 3 to 12 years old at the time of diagnosis, had a documented laryngeal examination within 90 days of diagnosis, and were ≥18 years as of December 31, 2014. Qualified subjects were contacted by telephone and administered the Vocal Handicap Index-10 (VHI-10) and a 15-item questionnaire inquiring for confounding factors. Results A total of 155 subjects were included, with a mean age of 21.4 years (range, 18-29). The male:female ratio was 2.3:1. Mean VHI-10 score for the entire cohort was 5.4. Mean VHI-10 scores did not differ between those who received speech therapy (6.1) and those who did not (4.5; P = .08). Both groups were similar with respect to confounding risk factors that can contribute to dysphonia, although the no-therapy group had a disproportionately higher number of subjects who consumed >10 alcoholic drinks per week ( P = .01). Conclusion The majority of adults with VFNs as children will achieve a close-to-normal voice quality when they reach adulthood. In our cohort, speech therapy did not appear to have an impact on the long-term voice outcomes.

  16. Measuring the rate of change of voice fundamental frequency in fluent speech during mental depression.

    PubMed

    Nilsonne, A; Sundberg, J; Ternström, S; Askenfelt, A

    1988-02-01

    A method of measuring the rate of change of fundamental frequency has been developed in an effort to find acoustic voice parameters that could be useful in psychiatric research. A minicomputer program was used to extract seven parameters from the fundamental frequency contour of tape-recorded speech samples: (1) the average rate of change of the fundamental frequency and (2) its standard deviation, (3) the absolute rate of fundamental frequency change, (4) the total reading time, (5) the percent pause time of the total reading time, (6) the mean, and (7) the standard deviation of the fundamental frequency distribution. The method is demonstrated on (a) a material consisting of synthetic speech and (b) voice recordings of depressed patients who were examined during depression and after improvement.

  17. Functional overlap between regions involved in speech perception and in monitoring one’s own voice during speech production

    PubMed Central

    Zheng, Zane Z.; Munhall, Kevin G; Johnsrude, Ingrid S

    2009-01-01

    The fluency and reliability of speech production suggests a mechanism that links motor commands and sensory feedback. Here, we examine the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not, and examining the overlap with the network recruited during passive listening to speech sounds. We use real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word (‘Ted’) and either heard this clearly, or heard voice-gated masking noise. We compare this to when they listened to yoked stimuli (identical recordings of ‘Ted’ or noise) without speaking. Activity along the superior temporal sulcus (STS) and superior temporal gyrus (STG) bilaterally was significantly greater if the auditory stimulus was a) processed as the auditory concomitant of speaking and b) did not match the predicted outcome (noise). The network exhibiting this Feedback type by Production/Perception interaction includes an STG/MTG region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts, and that processes an error signal in speech-sensitive regions when this and the sensory data do not match. PMID:19642886

  18. Tutorial and Guidelines on Measurement of Sound Pressure Level in Voice and Speech

    ERIC Educational Resources Information Center

    Švec, Jan G.; Granqvist, Svante

    2018-01-01

    Purpose: Sound pressure level (SPL) measurement of voice and speech is often considered a trivial matter, but the measured levels are often reported incorrectly or incompletely, making them difficult to compare among various studies. This article aims at explaining the fundamental principles behind these measurements and providing guidelines to…

  19. Speaking Math--A Voice Input, Speech Output Calculator for Students with Visual Impairments

    ERIC Educational Resources Information Center

    Bouck, Emily C.; Flanagan, Sara; Joshi, Gauri S.; Sheikh, Waseem; Schleppenbach, Dave

    2011-01-01

    This project explored a newly developed computer-based voice input, speech output (VISO) calculator. Three high school students with visual impairments educated at a state school for the blind and visually impaired participated in the study. The time they took to complete assessments and the average number of attempts per problem were recorded…

  20. The Effects of Language Experience and Speech Context on the Phonetic Accommodation of English-accented Spanish Voicing.

    PubMed

    Llanos, Fernando; Francis, Alexander L

    2017-03-01

    Native speakers of Spanish with different amounts of experience with English classified stop-consonant voicing (/b/ versus /p/) across different speech accents: English-accented Spanish, native Spanish, and native English. While listeners with little experience with English classified target voicing with an English- or Spanish-like voice onset time (VOT) boundary, predicted by contextual VOT, listeners familiar with English relied on an English-like VOT boundary in an English-accented Spanish context even in the absence of clear contextual cues to English VOT. This indicates that Spanish listeners accommodated English-accented Spanish voicing differently depending on their degree of familiarization with the English norm.

  1. Voice Response Systems Technology.

    ERIC Educational Resources Information Center

    Gerald, Jeanette

    1984-01-01

    Examines two methods of generating synthetic speech in voice response systems, which allow computers to communicate in human terms (speech), using human interface devices (ears): phoneme and reconstructed voice systems. Considerations prior to implementation, current and potential applications, glossary, directory, and introduction to Input Output…

  2. Voice Disorder Management Competencies: A Survey of School-Based Speech-Language Pathologists in Nebraska

    ERIC Educational Resources Information Center

    Teten, Amy F.; DeVeney, Shari L.; Friehe, Mary J.

    2016-01-01

    Purpose: The purpose of this survey was to determine the self-perceived competence levels in voice disorders of practicing school-based speech-language pathologists (SLPs) and identify correlated variables. Method: Participants were 153 master's level, school-based SLPs with a Nebraska teaching certificate and/or licensure who completed a survey,…

  3. Age-Related Changes to Spectral Voice Characteristics Affect Judgments of Prosodic, Segmental, and Talker Attributes for Child and Adult Speech

    ERIC Educational Resources Information Center

    Dilley, Laura C.; Wieland, Elizabeth A.; Gamache, Jessica L.; McAuley, J. Devin; Redford, Melissa A.

    2013-01-01

    Purpose: As children mature, changes in voice spectral characteristics co-vary with changes in speech, language, and behavior. In this study, spectral characteristics were manipulated to alter the perceived ages of talkers' voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were…

  4. Smartphone App for Voice Disorders

    MedlinePlus

    ... on. Feature: Taste, Smell, Hearing, Language, Voice, Balance Smartphone App for Voice Disorders Past Issues / Fall 2013 ... developed a mobile monitoring device that relies on smartphone technology to gather a week's worth of talking, ...

  5. Correlational Analysis of Speech Intelligibility Tests and Metrics for Speech Transmission

    DTIC Science & Technology

    2017-12-04

    frequency scale (male voice; normal voice effort) ............................... 4 Fig. 2 Diagram of a speech communication system (Letowski...languages. Consonants contain mostly high frequency (above 1500 Hz) speech energy, but this energy is relatively small in comparison to that of the whole...voices (Letowski et al. 1993). Since the mid- frequency spectral region contains mostly vowel energy while consonants are high frequency sounds, an

  6. Communication in a noisy environment: Perception of one's own voice and speech enhancement

    NASA Astrophysics Data System (ADS)

    Le Cocq, Cecile

    Workers in noisy industrial environments are often confronted to communication problems. Lost of workers complain about not being able to communicate easily with their coworkers when they wear hearing protectors. In consequence, they tend to remove their protectors, which expose them to the risk of hearing loss. In fact this communication problem is a double one: first the hearing protectors modify one's own voice perception; second they interfere with understanding speech from others. This double problem is examined in this thesis. When wearing hearing protectors, the modification of one's own voice perception is partly due to the occlusion effect which is produced when an earplug is inserted in the car canal. This occlusion effect has two main consequences: first the physiological noises in low frequencies are better perceived, second the perception of one's own voice is modified. In order to have a better understanding of this phenomenon, the literature results are analyzed systematically, and a new method to quantify the occlusion effect is developed. Instead of stimulating the skull with a bone vibrator or asking the subject to speak as is usually done in the literature, it has been decided to excite the buccal cavity with an acoustic wave. The experiment has been designed in such a way that the acoustic wave which excites the buccal cavity does not excite the external car or the rest of the body directly. The measurement of the hearing threshold in open and occluded car has been used to quantify the subjective occlusion effect for an acoustic wave in the buccal cavity. These experimental results as well as those reported in the literature have lead to a better understanding of the occlusion effect and an evaluation of the role of each internal path from the acoustic source to the internal car. The speech intelligibility from others is altered by both the high sound levels of noisy industrial environments and the speech signal attenuation due to hearing

  7. Cepstral analysis of normal and pathological voice in Spanish adults. Smoothed cepstral peak prominence in sustained vowels versus connected speech.

    PubMed

    Delgado-Hernández, Jonathan; León-Gómez, Nieves M; Izquierdo-Arteaga, Laura M; Llanos-Fumero, Yanira

    In recent years, the use of cepstral measures for acoustic evaluation of voice has increased. One of the most investigated parameters is smoothed cepstral peak prominence (CPPs). The objectives of this paper are to establish the usefulness of this acoustic measure in the objective evaluation of alterations of the voice in Spanish and to determine what type of voice sample (sustained vowels or connected speech) is the most sensitive in evaluating the severity of dysphonia. Forty subjects participated in this study 40, 20 controls and 20 with dysphonia. Two voice samples were recorded for each subject (one sustained vowel/a/and four phonetically balanced sentences) and the CPPs was calculated using the Praat programme. Three raters perceptually evaluated the voice sample with the Grade parameter of GRABS scale. Significantly lower values were found in the dysphonic voices, both for/a/(t [38] = 4.85, P<.000) and for phrases (t [38] = 5,75, P<.000). In relation to the type of voice sample most suitable for evaluating the severity of voice alterations, a strong correlation was found with the acoustic-perceptual scale of CPPs calculated from connected speech (r s = -0.73) and moderate correlation with that calculated from the sustained vowel (r s = -0,56). The results of this preliminary study suggest that CPPs is a good measure to detect dysphonia and to objectively assess the severity of alterations in the voice. Copyright © 2017 Elsevier España, S.L.U. and Sociedad Española de Otorrinolaringología y Cirugía de Cabeza y Cuello. All rights reserved.

  8. Voice onset time for female trained and untrained singers during speech and singing.

    PubMed

    McCrea, Christopher R; Morris, Richard J

    2007-01-01

    The purpose of this study was to examine the voice onset times of female trained and untrained singers during spoken and sung tasks. Thirty females were digitally recorded speaking and singing short phrases containing the English stop consonants /p/ and /b/ in the word-initial position. Voice onset time was measured for each phoneme and statistically analyzed. Mixed-ANOVAs revealed significantly longer voice onset time durations during speech for /p/ as compared to sung productions. No significant differences between the trained singers and untrained singers were observed. In addition, no task differences occurred for the /b/ productions. The results indicated that the type of phonatory task influences VOT for voiceless stops in females. As a result of this activity, the reader will be able to (1) understand articulatory and phonatory differences between spoken and sung productions; (2) understand the articulatory and phonatory timing differences between trained singers and untrained singers during spoken and sung productions.

  9. Speech coding

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfullymore » regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that

  10. Assessment of Severe Apnoea through Voice Analysis, Automatic Speech, and Speaker Recognition Techniques

    NASA Astrophysics Data System (ADS)

    Fernández Pozo, Rubén; Blanco Murillo, Jose Luis; Hernández Gómez, Luis; López Gonzalo, Eduardo; Alcázar Ramírez, José; Toledano, Doroteo T.

    2009-12-01

    This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.

  11. Reliability in perceptual analysis of voice quality.

    PubMed

    Bele, Irene Velsvik

    2005-12-01

    This study focuses on speaking voice quality in male teachers (n = 35) and male actors (n = 36), who represent untrained and trained voice users, because we wanted to investigate normal and supranormal voices. In this study, both substantial and methodologic aspects were considered. It includes a method for perceptual voice evaluation, and a basic issue was rater reliability. A listening group of 10 listeners, 7 experienced speech-language therapists, and 3 speech-language therapist students evaluated the voices by 15 vocal characteristics using VA scales. Two sets of voice signals were investigated: text reading (2 loudness levels) and sustained vowel (3 levels). The results indicated a high interrater reliability for most perceptual characteristics. Connected speech was evaluated more reliably, especially at the normal level, but both types of voice signals were evaluated reliably, although the reliability for connected speech was somewhat higher than for vowels. Experienced listeners tended to be more consistent in their ratings than did the student raters. Some vocal characteristics achieved acceptable reliability even with a smaller panel of listeners. The perceptual characteristics grouped in 4 factors reflected perceptual dimensions.

  12. Issues in forensic voice.

    PubMed

    Hollien, Harry; Huntley Bahr, Ruth; Harnsberger, James D

    2014-03-01

    The following article provides a general review of an area that can be referred to as Forensic Voice. Its goals will be outlined and that discussion will be followed by a description of its major elements. Considered are (1) the processing and analysis of spoken utterances, (2) distorted speech, (3) enhancement of speech intelligibility (re: surveillance and other recordings), (4) transcripts, (5) authentication of recordings, (6) speaker identification, and (7) the detection of deception, intoxication, and emotions in speech. Stress in speech and the psychological stress evaluation systems (that some individuals attempt to use as lie detectors) also will be considered. Points of entry will be suggested for individuals with the kinds of backgrounds possessed by professionals already working in the voice area. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  13. a Study of Multiplexing Schemes for Voice and Data.

    NASA Astrophysics Data System (ADS)

    Sriram, Kotikalapudi

    Voice traffic variations are characterized by on/off transitions of voice calls, and talkspurt/silence transitions of speakers in conversations. A speaker is known to be in silence for more than half the time during a telephone conversation. In this dissertation, we study some schemes which exploit speaker silences for an efficient utilization of the transmission capacity in integrated voice/data multiplexing and in digital speech interpolation. We study two voice/data multiplexing schemes. In each scheme, any time slots momentarily unutilized by the voice traffic are made available to data. In the first scheme, the multiplexer does not use speech activity detectors (SAD), and hence the voice traffic variations are due to call on/off only. In the second scheme, the multiplexer detects speaker silences using SAD and transmits voice only during talkspurts. The multiplexer with SAD performs digital speech interpolation (DSI) as well as dynamic channel allocation to voice and data. The performance of the two schemes is evaluated using discrete-time modeling and analysis. The data delay performance for the case of English speech is compared with that for the case of Japanese speech. A closed form expression for the mean data message delay is derived for the single-channel single-talker case. In a DSI system, occasional speech losses occur whenever the number of speakers in simultaneous talkspurt exceeds the number of TDM voice channels. In a buffered DSI system, speech loss is further reduced at the cost of delay. We propose a novel fixed-delay buffered DSI scheme. In this scheme, speech fill-in/hangover is not required because there are no variable delays. Hence, all silences that naturally occur in speech are fully utilized. Consequently, a substantial improvement in the DSI performance is made possible. The scheme is modeled and analyzed in discrete -time. Its performance is evaluated in terms of the probability of speech clipping, packet rejection ratio, DSI

  14. Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception.

    PubMed

    Schall, Sonja; von Kriegstein, Katharina

    2014-01-01

    It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers' voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker's face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.

  15. Speech therapy after thyroidectomy

    PubMed Central

    Wu, Che-Wei

    2017-01-01

    Common complaints of patients who have received thyroidectomy include dysphonia (voice dysfunction) and dysphagia (difficulty swallowing). One cause of these surgical outcomes is recurrent laryngeal nerve paralysis. Many studies have discussed the effectiveness of speech therapy (e.g., voice therapy and dysphagia therapy) for improving dysphonia and dysphagia, but not specifically in patients who have received thyroidectomy. Therefore, the aim of this paper was to discuss issues regarding speech therapy such as voice therapy and dysphagia for patients after thyroidectomy. Another aim was to review the literature on speech therapy for patients with recurrent laryngeal nerve paralysis after thyroidectomy. Databases used for the literature review in this study included, PubMed, MEDLINE, Academic Search Primer, ERIC, CINAHL Plus, and EBSCO. The articles retrieved by database searches were classified and screened for relevance by using EndNote. Of the 936 articles retrieved, 18 discussed “voice assessment and thyroidectomy”, 3 discussed “voice therapy and thyroidectomy”, and 11 discussed “surgical interventions for voice restoration after thyroidectomy”. Only 3 studies discussed topics related to “swallowing function assessment/treatment and thyroidectomy”. Although many studies have investigated voice changes and assessment methods in thyroidectomy patients, few recent studies have investigated speech therapy after thyroidectomy. Additionally, some studies have addressed dysphagia after thyroidectomy, but few have discussed assessment and treatment of dysphagia after thyroidectomy. PMID:29142841

  16. Voice-stress measure of mental workload

    NASA Technical Reports Server (NTRS)

    Alpert, Murray; Schneider, Sid J.

    1988-01-01

    In a planned experiment, male subjects between the age of 18 and 50 will be required to produce speech while performing various tasks. Analysis of the speech produced should reveal which aspects of voice prosody are associated with increased workloads. Preliminary results with two female subjects suggest a possible trend for voice frequency and amplitude to be higher and the variance of the voice frequency to be lower in the high workload condition.

  17. Functional Connectivity between Face-Movement and Speech-Intelligibility Areas during Auditory-Only Speech Perception

    PubMed Central

    Schall, Sonja; von Kriegstein, Katharina

    2014-01-01

    It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers’ voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker’s face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas. PMID:24466026

  18. Cognitive Load in Voice Therapy Carry-Over Exercises

    ERIC Educational Resources Information Center

    Iwarsson, Jenny; Morris, David Jackson; Balling, Laura Winther

    2017-01-01

    Purpose: The cognitive load generated by online speech production may vary with the nature of the speech task. This article examines 3 speech tasks used in voice therapy carry-over exercises, in which a patient is required to adopt and automatize new voice behaviors, ultimately in daily spontaneous communication. Method: Twelve subjects produced…

  19. Voiced Excitations

    DTIC Science & Technology

    2004-12-01

    3701 North Fairfax Drive Arlington, VA 22203-1714 NA NA NA Radar & EM Speech, Voiced Speech Excitations 61 ULUNCLASSIFIED UNCLASSIFIED UNCLASSIFIED...New Ideas for Speech Recognition and Related Technologies”, Lawrence Livermore National Laboratory Report, UCRL -UR-120310 , 1995 . Available from...Livermore Laboratory report UCRL -JC-134775M Holzrichter 2003, Holzrichter J.F., Kobler, J. B., Rosowski, J.J., Burke, G.J., (2003) “EM wave

  20. High-frequency energy in singing and speech

    NASA Astrophysics Data System (ADS)

    Monson, Brian Bruce

    While human speech and the human voice generate acoustical energy up to (and beyond) 20 kHz, the energy above approximately 5 kHz has been largely neglected. Evidence is accruing that this high-frequency energy contains perceptual information relevant to speech and voice, including percepts of quality, localization, and intelligibility. The present research was an initial step in the long-range goal of characterizing high-frequency energy in singing voice and speech, with particular regard for its perceptual role and its potential for modification during voice and speech production. In this study, a database of high-fidelity recordings of talkers was created and used for a broad acoustical analysis and general characterization of high-frequency energy, as well as specific characterization of phoneme category, voice and speech intensity level, and mode of production (speech versus singing) by high-frequency energy content. Directionality of radiation of high-frequency energy from the mouth was also examined. The recordings were used for perceptual experiments wherein listeners were asked to discriminate between speech and voice samples that differed only in high-frequency energy content. Listeners were also subjected to gender discrimination tasks, mode-of-production discrimination tasks, and transcription tasks with samples of speech and singing that contained only high-frequency content. The combination of these experiments has revealed that (1) human listeners are able to detect very subtle level changes in high-frequency energy, and (2) human listeners are able to extract significant perceptual information from high-frequency energy.

  1. Perception of a non-native speech contrast: Voiced and voiceless stops as perceived by Tamil speakers

    NASA Astrophysics Data System (ADS)

    Tur, Sylwia

    2004-05-01

    The effect of linguistic experience plays a significant role in how speech sounds are perceived. The findings of many studies imply that the perception of non-native contrasts depends on their status in the native language of the listener. Tamil is a language with a single voicing category. All stop consonants in Tamil are phonemically voiceless, though allophonic voicing has been observed in spoken Tamil. The present study examined how native Tamil speakers and English controls perceived voiced and voiceless bilabial, alveolar, and velar stops in English. Voice onset time (VOT) was manipulated for editing of naturally produced stimuli with increasingly longer continuum. Perceptual data was collected from 16 Tamil and 16 English speakers. Experiment 1 was an AX task in which subjects responded same or different to 162 pairs of stimuli. Experiment 2 was a forced choice ID task in which subjects identified 99 individually presented stimuli as pa, ta, ka or ba, da, ga. Experiments show statistically significant differences between Tamil and English speakers in their perception of English stop consonants. Results of the study imply that the allophonic status of voiced stops in Tamil does not aid the Tamil speakers in perceiving phonemically voiced stops in English.

  2. A high quality voice coder with integrated echo canceller and voice activity detector for mobile satellite applications

    NASA Technical Reports Server (NTRS)

    Kondoz, A. M.; Evans, B. G.

    1993-01-01

    In the last decade, low bit rate speech coding research has received much attention resulting in newly developed, good quality, speech coders operating at as low as 4.8 Kb/s. Although speech quality at around 8 Kb/s is acceptable for a wide variety of applications, at 4.8 Kb/s more improvements in quality are necessary to make it acceptable to the majority of applications and users. In addition to the required low bit rate with acceptable speech quality, other facilities such as integrated digital echo cancellation and voice activity detection are now becoming necessary to provide a cost effective and compact solution. In this paper we describe a CELP speech coder with integrated echo canceller and a voice activity detector all of which have been implemented on a single DSP32C with 32 KBytes of SRAM. The quality of CELP coded speech has been improved significantly by a new codebook implementation which also simplifies the encoder/decoder complexity making room for the integration of a 64-tap echo canceller together with a voice activity detector.

  3. A ''Voice Inversion Effect?''

    ERIC Educational Resources Information Center

    Bedard, Catherine; Belin, Pascal

    2004-01-01

    Voice is the carrier of speech but is also an ''auditory face'' rich in information on the speaker's identity and affective state. Three experiments explored the possibility of a ''voice inversion effect,'' by analogy to the classical ''face inversion effect,'' which could support the hypothesis of a voice-specific module. Experiment 1 consisted…

  4. Use of Spectral/Cepstral Analyses for Differentiating Normal from Hypofunctional Voices in Sustained Vowel and Continuous Speech Contexts

    ERIC Educational Resources Information Center

    Watts, Christopher R.; Awan, Shaheen N.

    2011-01-01

    Purpose: In this study, the authors evaluated the diagnostic value of spectral/cepstral measures to differentiate dysphonic from nondysphonic voices using sustained vowels and continuous speech samples. Methodology: Thirty-two age- and gender-matched individuals (16 participants with dysphonia and 16 controls) were recorded reading a standard…

  5. Voice parameters and videonasolaryngoscopy in children with vocal nodules: a longitudinal study, before and after voice therapy.

    PubMed

    Valadez, Victor; Ysunza, Antonio; Ocharan-Hernandez, Esther; Garrido-Bustamante, Norma; Sanchez-Valerio, Araceli; Pamplona, Ma C

    2012-09-01

    Vocal Nodules (VN) are a functional voice disorder associated with voice misuse and abuse in children. There are few reports addressing vocal parameters in children with VN, especially after a period of vocal rehabilitation. The purpose of this study is to describe measurements of vocal parameters including Fundamental Frequency (FF), Shimmer (S), and Jitter (J), videonasolaryngoscopy examination and clinical perceptual assessment, before and after voice therapy in children with VN. Voice therapy was provided using visual support through Speech-Viewer software. Twenty patients with VN were studied. An acoustical analysis of voice was performed and compared with data from subjects from a control group matched by age and gender. Also, clinical perceptual assessment of voice and videonasolaryngoscopy were performed to all patients with VN. After a period of voice therapy, provided with visual support using Speech Viewer-III (SV-III-IBM) software, new acoustical analyses, perceptual assessments and videonasolaryngoscopies were performed. Before the onset of voice therapy, there was a significant difference (p<0.05) in mean FF, S and J, between the patients with VN and subjects from the control group. After the voice therapy period, a significant improvement (p<0.05) was found in all acoustic voice parameters. Moreover, perceptual voice analysis demonstrated improvement in all cases. Finally, videonasolaryngoscopy demonstrated that vocal nodules were no longer discernible on the vocal folds in any of the cases. SV-III software seems to be a safe and reliable method for providing voice therapy in children with VN. Acoustic voice parameters, perceptual data and videonasolaryngoscopy were significantly improved after the speech therapy period was completed. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  6. Correlation of VHI-10 to voice laboratory measurements across five common voice disorders.

    PubMed

    Gillespie, Amanda I; Gooding, William; Rosen, Clark; Gartner-Schmidt, Jackie

    2014-07-01

    To correlate change in Voice Handicap Index (VHI)-10 scores with corresponding voice laboratory measures across five voice disorders. Retrospective study. One hundred fifty patients aged >18 years with primary diagnosis of vocal fold lesions, primary muscle tension dysphonia-1, atrophy, unilateral vocal fold paralysis (UVFP), and scar. For each group, participants with the largest change in VHI-10 between two periods (TA and TB) were selected. The dates of the VHI-10 values were linked to corresponding acoustic/aerodynamic and audio-perceptual measures. Change in voice laboratory values were analyzed for correlation with each other and with VHI-10. VHI-10 scores were greater for patients with UVFP than other disorders. The only disorder-specific correlation between voice laboratory measure and VHI-10 was average phonatory airflow in speech for patients with UVFP. Average airflow in repeated phonemes was strongly correlated with average airflow in speech (r=0.75). Acoustic measures did not significantly change between time points. The lack of correlations between the VHI-10 change scores and voice laboratory measures may be due to differing constructs of each measure; namely, handicap versus physiological function. Presuming corroboration between these measures may be faulty. Average airflow in speech may be the most ecologically valid measure for patients with UVFP. Although aerodynamic measures changed between the time points, acoustic measures did not. Correlations to VHI-10 and change between time points may be found with other acoustic measures. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  7. Evaluation of Speech Recognition of Cochlear Implant Recipients Using Adaptive, Digital Remote Microphone Technology and a Speech Enhancement Sound Processing Algorithm.

    PubMed

    Wolfe, Jace; Morais, Mila; Schafer, Erin; Agrawal, Smita; Koch, Dawn

    2015-05-01

    Cochlear implant recipients often experience difficulty with understanding speech in the presence of noise. Cochlear implant manufacturers have developed sound processing algorithms designed to improve speech recognition in noise, and research has shown these technologies to be effective. Remote microphone technology utilizing adaptive, digital wireless radio transmission has also been shown to provide significant improvement in speech recognition in noise. There are no studies examining the potential improvement in speech recognition in noise when these two technologies are used simultaneously. The goal of this study was to evaluate the potential benefits and limitations associated with the simultaneous use of a sound processing algorithm designed to improve performance in noise (Advanced Bionics ClearVoice) and a remote microphone system that incorporates adaptive, digital wireless radio transmission (Phonak Roger). A two-by-two way repeated measures design was used to examine performance differences obtained without these technologies compared to the use of each technology separately as well as the simultaneous use of both technologies. Eleven Advanced Bionics (AB) cochlear implant recipients, ages 11 to 68 yr. AzBio sentence recognition was measured in quiet and in the presence of classroom noise ranging in level from 50 to 80 dBA in 5-dB steps. Performance was evaluated in four conditions: (1) No ClearVoice and no Roger, (2) ClearVoice enabled without the use of Roger, (3) ClearVoice disabled with Roger enabled, and (4) simultaneous use of ClearVoice and Roger. Speech recognition in quiet was better than speech recognition in noise for all conditions. Use of ClearVoice and Roger each provided significant improvement in speech recognition in noise. The best performance in noise was obtained with the simultaneous use of ClearVoice and Roger. ClearVoice and Roger technology each improves speech recognition in noise, particularly when used at the same time

  8. The interlanguage speech intelligibility benefit for native speakers of Mandarin: Production and perception of English word-final voicing contrasts

    PubMed Central

    Hayes-Harb, Rachel; Smith, Bruce L.; Bent, Tessa; Bradlow, Ann R.

    2009-01-01

    This study investigated the intelligibility of native and Mandarin-accented English speech for native English and native Mandarin listeners. The word-final voicing contrast was considered (as in minimal pairs such as `cub' and `cup') in a forced-choice word identification task. For these particular talkers and listeners, there was evidence of an interlanguage speech intelligibility benefit for listeners (i.e., native Mandarin listeners were more accurate than native English listeners at identifying Mandarin-accented English words). However, there was no evidence of an interlanguage speech intelligibility benefit for talkers (i.e., native Mandarin listeners did not find Mandarin-accented English speech more intelligible than native English speech). When listener and talker phonological proficiency (operationalized as accentedness) was taken into account, it was found that the interlanguage speech intelligibility benefit for listeners held only for the low phonological proficiency listeners and low phonological proficiency speech. The intelligibility data were also considered in relation to various temporal-acoustic properties of native English and Mandarin-accented English speech in effort to better understand the properties of speech that may contribute to the interlanguage speech intelligibility benefit. PMID:19606271

  9. Low Vocal Pitch Preference Drives First Impressions Irrespective of Context in Male Voices but Not in Female Voices.

    PubMed

    Tsantani, Maria S; Belin, Pascal; Paterson, Helena M; McAleer, Phil

    2016-08-01

    Vocal pitch has been found to influence judgments of perceived trustworthiness and dominance from a novel voice. However, the majority of findings arise from using only male voices and in context-specific scenarios. In two experiments, we first explore the influence of average vocal pitch on first-impression judgments of perceived trustworthiness and dominance, before establishing the existence of an overall preference for high or low pitch across genders. In Experiment 1, pairs of high- and low-pitched temporally reversed recordings of male and female vocal utterances were presented in a two-alternative forced-choice task. Results revealed a tendency to select the low-pitched voice over the high-pitched voice as more trustworthy, for both genders, and more dominant, for male voices only. Experiment 2 tested an overall preference for low-pitched voices, and whether judgments were modulated by speech content, using forward and reversed speech to manipulate context. Results revealed an overall preference for low pitch, irrespective of direction of speech, in male voices only. No such overall preference was found for female voices. We propose that an overall preference for low pitch is a default prior in male voices irrespective of context, whereas pitch preferences in female voices are more context- and situation-dependent. The present study confirms the important role of vocal pitch in the formation of first-impression personality judgments and advances understanding of the impact of context on pitch preferences across genders.

  10. [Design of standard voice sample text for subjective auditory perceptual evaluation of voice disorders].

    PubMed

    Li, Jin-rang; Sun, Yan-yan; Xu, Wen

    2010-09-01

    To design a speech voice sample text with all phonemes in Mandarin for subjective auditory perceptual evaluation of voice disorders. The principles for design of a speech voice sample text are: The short text should include the 21 initials and 39 finals, this may cover all the phonemes in Mandarin. Also, the short text should have some meanings. A short text was made out. It had 155 Chinese words, and included 21 initials and 38 finals (the final, ê, was not included because it was rarely used in Mandarin). Also, the text covered 17 light tones and one "Erhua". The constituent ratios of the initials and finals presented in this short text were statistically similar as those in Mandarin according to the method of similarity of the sample and population (r = 0.742, P < 0.001 and r = 0.844, P < 0.001, respectively). The constituent ratios of the tones presented in this short text were statistically not similar as those in Mandarin (r = 0.731, P > 0.05). A speech voice sample text with all phonemes in Mandarin was made out. The constituent ratios of the initials and finals presented in this short text are similar as those in Mandarin. Its value for subjective auditory perceptual evaluation of voice disorders need further study.

  11. Verbal Paradata and Survey Error: Respondent Speech, Voice, and Question-Answering Behavior Can Predict Income Item Nonresponse

    ERIC Educational Resources Information Center

    Jans, Matthew E.

    2010-01-01

    Income nonresponse is a significant problem in survey data, with rates as high as 50%, yet we know little about why it occurs. It is plausible that the way respondents answer survey questions (e.g., their voice and speech characteristics, and their question- answering behavior) can predict whether they will provide income data, and will reflect…

  12. Designing interaction, voice, and inclusion in AAC research.

    PubMed

    Pullin, Graham; Treviranus, Jutta; Patel, Rupal; Higginbotham, Jeff

    2017-09-01

    The ISAAC 2016 Research Symposium included a Design Stream that examined timely issues across augmentative and alternative communication (AAC), framed in terms of designing interaction, designing voice, and designing inclusion. Each is a complex term with multiple meanings; together they represent challenging yet important frontiers of AAC research. The Design Stream was conceived by the four authors, researchers who have been exploring AAC and disability-related design throughout their careers, brought together by a shared conviction that designing for communication implies more than ensuring access to words and utterances. Each of these presenters came to AAC from a different background: interaction design, inclusive design, speech science, and social science. The resulting discussion among 24 symposium participants included controversies about the role of technology, tensions about independence and interdependence, and a provocation about taste. The paper concludes by proposing new directions for AAC research: (a) new interdisciplinary research could combine scientific and design research methods, as distant yet complementary as microanalysis and interaction design, (b) new research tools could seed accessible and engaging contextual research into voice within a social model of disability, and (c) new open research networks could support inclusive, international and interdisciplinary research.

  13. Do What I Say! Voice Recognition Makes Major Advances.

    ERIC Educational Resources Information Center

    Ruley, C. Dorsey

    1994-01-01

    Explains voice recognition technology applications in the workplace, schools, and libraries. Highlights include a voice-controlled work station using the DragonDictate system that can be used with dyslexic students, converting text to speech, and converting speech to text. (LRW)

  14. Lee Silverman Voice Treatment versus standard speech and language therapy versus control in Parkinson's disease: a pilot randomised controlled trial (PD COMM pilot).

    PubMed

    Sackley, Catherine M; Smith, Christina H; Rick, Caroline E; Brady, Marian C; Ives, Natalie; Patel, Smitaa; Woolley, Rebecca; Dowling, Francis; Patel, Ramilla; Roberts, Helen; Jowett, Sue; Wheatley, Keith; Kelly, Debbie; Sands, Gina; Clarke, Carl E

    2018-01-01

    Speech-related problems are common in Parkinson's disease (PD), but there is little evidence for the effectiveness of standard speech and language therapy (SLT) or Lee Silverman Voice Treatment (LSVT LOUD®). The PD COMM pilot was a three-arm, assessor-blinded, randomised controlled trial (RCT) of LSVT LOUD®, SLT and no intervention (1:1:1 ratio) to assess the feasibility and to inform the design of a full-scale RCT. Non-demented patients with idiopathic PD and speech problems and no SLT for speech problems in the past 2 years were eligible. LSVT LOUD® is a standardised regime (16 sessions over 4 weeks). SLT comprised individualised content per local practice (typically weekly sessions for 6-8 weeks). Outcomes included recruitment and retention, treatment adherence, and data completeness. Outcome data collected at baseline, 3, 6, and 12 months included patient-reported voice and quality of life measures, resource use, and assessor-rated speech recordings. Eighty-nine patients were randomised with 90% in the therapy groups and 100% in the control group completing the trial. The response rate for Voice Handicap Index (VHI) in each arm was ≥ 90% at all time-points. VHI was highly correlated with the other speech-related outcome measures. There was a trend to improvement in VHI with LSVT LOUD® (difference at 3 months compared with control: - 12.5 points; 95% CI - 26.2, 1.2) and SLT (difference at 3 months compared with control: - 9.8 points; 95% CI - 23.2, 3.7) which needs to be confirmed in an adequately powered trial. Randomisation to a three-arm trial of speech therapy including a no intervention control is feasible and acceptable. Compliance with both interventions was good. VHI and other patient-reported outcomes were relevant measures and provided data to inform the sample size for a substantive trial. International Standard Randomised Controlled Trial Number Register: ISRCTN75223808. registered 22 March 2012.

  15. Scientific bases of human-machine communication by voice.

    PubMed Central

    Schafer, R W

    1995-01-01

    The scientific bases for human-machine communication by voice are in the fields of psychology, linguistics, acoustics, signal processing, computer science, and integrated circuit technology. The purpose of this paper is to highlight the basic scientific and technological issues in human-machine communication by voice and to point out areas of future research opportunity. The discussion is organized around the following major issues in implementing human-machine voice communication systems: (i) hardware/software implementation of the system, (ii) speech synthesis for voice output, (iii) speech recognition and understanding for voice input, and (iv) usability factors related to how humans interact with machines. PMID:7479802

  16. Hemispheric association and dissociation of voice and speech information processing in stroke.

    PubMed

    Jones, Anna B; Farrall, Andrew J; Belin, Pascal; Pernet, Cyril R

    2015-10-01

    As we listen to someone speaking, we extract both linguistic and non-linguistic information. Knowing how these two sets of information are processed in the brain is fundamental for the general understanding of social communication, speech recognition and therapy of language impairments. We investigated the pattern of performances in phoneme versus gender categorization in left and right hemisphere stroke patients, and found an anatomo-functional dissociation in the right frontal cortex, establishing a new syndrome in voice discrimination abilities. In addition, phoneme and gender performances were most often associated than dissociated in the left hemisphere patients, suggesting a common neural underpinnings. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Hearing Story Characters' Voices: Auditory Imagery during Reading

    ERIC Educational Resources Information Center

    Gunraj, Danielle N.; Klin, Celia M.

    2012-01-01

    Despite the longstanding belief in an inner voice, there is surprisingly little known about the perceptual features of that voice during text processing. This article asked whether readers infer nonlinguistic phonological features, such as speech rate, associated with a character's speech. Previous evidence for this type of auditory imagery has…

  18. A pneumatic Bionic Voice prosthesis-Pre-clinical trials of controlling the voice onset and offset.

    PubMed

    Ahmadi, Farzaneh; Noorian, Farzad; Novakovic, Daniel; van Schaik, André

    2018-01-01

    Despite emergent progress in many fields of bionics, a functional Bionic Voice prosthesis for laryngectomy patients (larynx amputees) has not yet been achieved, leading to a lifetime of vocal disability for these patients. This study introduces a novel framework of Pneumatic Bionic Voice Prostheses as an electronic adaptation of the Pneumatic Artificial Larynx (PAL) device. The PAL is a non-invasive mechanical voice source, driven exclusively by respiration with an exceptionally high voice quality, comparable to the existing gold standard of Tracheoesophageal (TE) voice prosthesis. Following PAL design closely as the reference, Pneumatic Bionic Voice Prostheses seem to have a strong potential to substitute the existing gold standard by generating a similar voice quality while remaining non-invasive and non-surgical. This paper designs the first Pneumatic Bionic Voice prosthesis and evaluates its onset and offset control against the PAL device through pre-clinical trials on one laryngectomy patient. The evaluation on a database of more than five hours of continuous/isolated speech recordings shows a close match between the onset/offset control of the Pneumatic Bionic Voice and the PAL with an accuracy of 98.45 ±0.54%. When implemented in real-time, the Pneumatic Bionic Voice prosthesis controller has an average onset/offset delay of 10 milliseconds compared to the PAL. Hence it addresses a major disadvantage of previous electronic voice prostheses, including myoelectric Bionic Voice, in meeting the short time-frames of controlling the onset/offset of the voice in continuous speech.

  19. A Development of a System Enables Character Input and PC Operation via Voice for a Physically Disabled Person with a Speech Impediment

    NASA Astrophysics Data System (ADS)

    Tanioka, Toshimasa; Egashira, Hiroyuki; Takata, Mayumi; Okazaki, Yasuhisa; Watanabe, Kenzi; Kondo, Hiroki

    We have designed and implemented a PC operation support system for a physically disabled person with a speech impediment via voice. Voice operation is an effective method for a physically disabled person with involuntary movement of the limbs and the head. We have applied a commercial speech recognition engine to develop our system for practical purposes. Adoption of a commercial engine reduces development cost and will contribute to make our system useful to another speech impediment people. We have customized commercial speech recognition engine so that it can recognize the utterance of a person with a speech impediment. We have restricted the words that the recognition engine recognizes and separated a target words from similar words in pronunciation to avoid misrecognition. Huge number of words registered in commercial speech recognition engines cause frequent misrecognition for speech impediments' utterance, because their utterance is not clear and unstable. We have solved this problem by narrowing the choice of input down in a small number and also by registering their ambiguous pronunciations in addition to the original ones. To realize all character inputs and all PC operation with a small number of words, we have designed multiple input modes with categorized dictionaries and have introduced two-step input in each mode except numeral input to enable correct operation with small number of words. The system we have developed is in practical level. The first author of this paper is physically disabled with a speech impediment. He has been able not only character input into PC but also to operate Windows system smoothly by using this system. He uses this system in his daily life. This paper is written by him with this system. At present, the speech recognition is customized to him. It is, however, possible to customize for other users by changing words and registering new pronunciation according to each user's utterance.

  20. Multimodal Speech Capture System for Speech Rehabilitation and Learning.

    PubMed

    Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam

    2017-11-01

    Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.

  1. Practical applications of interactive voice technologies: Some accomplishments and prospects

    NASA Technical Reports Server (NTRS)

    Grady, Michael W.; Hicklin, M. B.; Porter, J. E.

    1977-01-01

    A technology assessment of the application of computers and electronics to complex systems is presented. Three existing systems which utilize voice technology (speech recognition and speech generation) are described. Future directions in voice technology are also described.

  2. Speech in spinocerebellar ataxia.

    PubMed

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.

  3. Speech, Prosody, and Voice Characteristics of a Mother and Daughter with a 7;13 Translocation Affecting "FOXP2"

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Ballard, Kirrie J.; Tomblin, J. Bruce; Duffy, Joseph R.; Odell, Katharine H.; Williams, Charles A.

    2006-01-01

    Purpose: The primary goal of this case study was to describe the speech, prosody, and voice characteristics of a mother and daughter with a breakpoint in a balanced 7;13 chromosomal translocation that disrupted the transcription gene, "FOXP2" (cf. J. B. Tomblin et al., 2005). As with affected members of the widely cited KE family, whose…

  4. Micro-Based Speech Recognition: Instructional Innovation for Handicapped Learners.

    ERIC Educational Resources Information Center

    Horn, Carin E.; Scott, Brian L.

    A new voice based learning system (VBLS), which allows the handicapped user to interact with a microcomputer by voice commands, is described. Speech or voice recognition is the computerized process of identifying a spoken word or phrase, including those resulting from speech impediments. This new technology is helpful to the severely physically…

  5. Human voice perception.

    PubMed

    Latinus, Marianne; Belin, Pascal

    2011-02-22

    We are all voice experts. First and foremost, we can produce and understand speech, and this makes us a unique species. But in addition to speech perception, we routinely extract from voices a wealth of socially-relevant information in what constitutes a more primitive, and probably more universal, non-linguistic mode of communication. Consider the following example: you are sitting in a plane, and you can hear a conversation in a foreign language in the row behind you. You do not see the speakers' faces, and you cannot understand the speech content because you do not know the language. Yet, an amazing amount of information is available to you. You can evaluate the physical characteristics of the different protagonists, including their gender, approximate age and size, and associate an identity to the different voices. You can form a good idea of the different speaker's mood and affective state, as well as more subtle cues as the perceived attractiveness or dominance of the protagonists. In brief, you can form a fairly detailed picture of the type of social interaction unfolding, which a brief glance backwards can on the occasion help refine - sometimes surprisingly so. What are the acoustical cues that carry these different types of vocal information? How does our brain process and analyse this information? Here we briefly review an emerging field and the main tools used in voice perception research. Copyright © 2011 Elsevier Ltd. All rights reserved.

  6. Can we perceptually rate alaryngeal voice? Developing the Sunderland Tracheoesophageal Voice Perceptual Scale.

    PubMed

    Hurren, A; Hildreth, A J; Carding, P N

    2009-12-01

    To investigate the inter and intra reliability of raters (in relation to both profession and expertise) when judging two alaryngeal voice parameters: 'Overall Grade' and 'Neoglottal Tonicity'. Reliable perceptual assessment is essential for surgical and therapeutic outcome measurement but has been minimally researched to date. Test of inter and intra rater agreement from audio recordings of 55 tracheoesophageal speakers. Cancer Unit. Twelve speech and language therapists and ten Ear, Nose and Throat surgeons. Perceptual voice parameters of 'Overall Grade' rated with a 0-3 equally appearing interval scale and 'Neoglottal Tonicity' with an 11-point bipolar semantic scale. All raters achieved 'good' agreement for 'Overall Grade' with mean weighted kappa coefficients of 0.78 for intra and 0.70 for inter-rater agreement. All raters achieved 'good' intra-rater agreement for 'Neoglottal Tonicity' (0.64) but inter-rater agreement was only 'moderate' (0.40). However, the expert speech and language therapists sub-group attained 'good' inter-rater agreement with this parameter (0.63). The effect of 'Neoglottal Tonicity' on 'Overall Grade' was examined utilising only expert speech and language therapists data. Linear regression analysis resulted in an r-squared coefficient of 0.67. Analysis of the perceptual impression of hypotonicity and hypertonicity in relation to mean 'Overall Grade' score demonstrated neither tone was linked to a more favourable grade (P = 0.42). Expert speech and language therapist raters may be the optimal judges for tracheoesophageal voice assessment. Tonicity appears to be a good predictor of 'Overall Grade'. These scales have clinical applicability to investigate techniques that facilitate optotonic neoglottal voice quality.

  7. A pneumatic Bionic Voice prosthesis—Pre-clinical trials of controlling the voice onset and offset

    PubMed Central

    Noorian, Farzad; Novakovic, Daniel; van Schaik, André

    2018-01-01

    Despite emergent progress in many fields of bionics, a functional Bionic Voice prosthesis for laryngectomy patients (larynx amputees) has not yet been achieved, leading to a lifetime of vocal disability for these patients. This study introduces a novel framework of Pneumatic Bionic Voice Prostheses as an electronic adaptation of the Pneumatic Artificial Larynx (PAL) device. The PAL is a non-invasive mechanical voice source, driven exclusively by respiration with an exceptionally high voice quality, comparable to the existing gold standard of Tracheoesophageal (TE) voice prosthesis. Following PAL design closely as the reference, Pneumatic Bionic Voice Prostheses seem to have a strong potential to substitute the existing gold standard by generating a similar voice quality while remaining non-invasive and non-surgical. This paper designs the first Pneumatic Bionic Voice prosthesis and evaluates its onset and offset control against the PAL device through pre-clinical trials on one laryngectomy patient. The evaluation on a database of more than five hours of continuous/isolated speech recordings shows a close match between the onset/offset control of the Pneumatic Bionic Voice and the PAL with an accuracy of 98.45 ±0.54%. When implemented in real-time, the Pneumatic Bionic Voice prosthesis controller has an average onset/offset delay of 10 milliseconds compared to the PAL. Hence it addresses a major disadvantage of previous electronic voice prostheses, including myoelectric Bionic Voice, in meeting the short time-frames of controlling the onset/offset of the voice in continuous speech. PMID:29466455

  8. Sperry Univac speech communications technology

    NASA Technical Reports Server (NTRS)

    Medress, Mark F.

    1977-01-01

    Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.

  9. Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson's disease.

    PubMed

    Rusz, J; Cmejla, R; Ruzickova, H; Ruzicka, E

    2011-01-01

    An assessment of vocal impairment is presented for separating healthy people from persons with early untreated Parkinson's disease (PD). This study's main purpose was to (a) determine whether voice and speech disorder are present from early stages of PD before starting dopaminergic pharmacotherapy, (b) ascertain the specific characteristics of the PD-related vocal impairment, (c) identify PD-related acoustic signatures for the major part of traditional clinically used measurement methods with respect to their automatic assessment, and (d) design new automatic measurement methods of articulation. The varied speech data were collected from 46 Czech native speakers, 23 with PD. Subsequently, 19 representative measurements were pre-selected, and Wald sequential analysis was then applied to assess the efficiency of each measure and the extent of vocal impairment of each subject. It was found that measurement of the fundamental frequency variations applied to two selected tasks was the best method for separating healthy from PD subjects. On the basis of objective acoustic measures, statistical decision-making theory, and validation from practicing speech therapists, it has been demonstrated that 78% of early untreated PD subjects indicate some form of vocal impairment. The speech defects thus uncovered differ individually in various characteristics including phonation, articulation, and prosody.

  10. Dissecting choral speech: properties of the accompanist critical to stuttering reduction.

    PubMed

    Kiefte, Michael; Armson, Joy

    2008-01-01

    The effects of choral speech and altered auditory feedback (AAF) on stuttering frequency were compared to identify those properties of choral speech that make it a more effective condition for stuttering reduction. Seventeen adults who stutter (AWS) participated in an experiment consisting of special choral speech conditions that were manipulated to selectively eliminate specific differences between choral speech and AAF. Consistent with previous findings, results showed that both choral speech and AAF reduced stuttering compared to solo reading. Although reductions under AAF were substantial, they were less dramatic than those for choral speech. Stuttering reduction for choral speech was highly robust even when the accompanist's voice temporally lagged that of the AWS, when there was no opportunity for dynamic interplay between the AWS and accompanist, and when the accompanist was replaced by the AWS's own voice, all of which approximate specific features of AAF. Choral speech was also highly effective in reducing stuttering across changes in speech rate and for both familiar and unfamiliar passages. We concluded that differences in properties between choral speech and AAF other than those that were manipulated in this experiment must account for differences in stuttering reduction. The reader will be able to (1) describe differences in stuttering reduction associated with altered auditory feedback compared to choral speech conditions and (2) describe differences between delivery of a second voice signal as an altered rendition of the speakers own voice (altered auditory feedback) and alterations in the voice of an accompanist (choral speech).

  11. Impact of auditory training for perceptual assessment of voice executed by undergraduate students in Speech-Language Pathology.

    PubMed

    Silva, Regiane Serafim Abreu; Simões-Zenari, Marcia; Nemr, Nair Kátia

    2012-01-01

    To analyze the impact of auditory training for auditory-perceptual assessment carried out by Speech-Language Pathology undergraduate students. During two semesters, 17 undergraduate students enrolled in theoretical subjects regarding phonation (Phonation/Phonation Disorders) analyzed samples of altered and unaltered voices (selected for this purpose), using the GRBAS scale. All subjects received auditory training during nine 15-minute meetings. In each meeting, a different parameter was presented using the different voices sample, with predominance of the trained aspect in each session. Sample assessment using the scale was carried out before and after training, and in other four opportunities throughout the meetings. Students' assessments were compared to an assessment carried out by three voice-experts speech-language pathologists who were the judges. To verify training effectiveness, the Friedman's test and the Kappa index were used. The rate of correct answers in the pre-training was considered between regular and good. It was observed maintenance of the number of correct answers throughout assessments, for most of the scale parameters. In the post-training moment, the students showed improvements in the analysis of asthenia, a parameter that was emphasized during training after the students reported difficulties analyzing it. There was a decrease in the number of correct answers for the roughness parameter after it was approached segmented into hoarseness and harshness, and observed in association with different diagnoses and acoustic parameters. Auditory training enhances students' initial abilities to perform the evaluation, aside from guiding adjustments in the dynamics of the university subject.

  12. Prototype app for voice therapy: a peer review.

    PubMed

    Lavaissiéri, Paula; Melo, Paulo Eduardo Damasceno

    2017-03-09

    Voice speech therapy promotes changes in patients' voice-related habits and rehabilitation. Speech-language therapists use a host of materials ranging from pictures to electronic resources and computer tools as aids in this process. Mobile technology is attractive, interactive and a nearly constant feature in the daily routine of a large part of the population and has a growing application in healthcare. To develop a prototype application for voice therapy, submit it to peer assessment, and to improve the initial prototype based on these assessments. a prototype of the Q-Voz application was developed based on Apple's Human Interface Guidelines. The prototype was analyzed by seven speech therapists who work in the voice area. Improvements to the product were made based on these assessments. all features of the application were considered satisfactory by most evaluators. All evaluators found the application very useful; evaluators reported that patients would find it easier to make changes in voice behavior with the application than without it; the evaluators stated they would use this application with their patients with dysphonia and in the process of rehabilitation and that the application offers useful tools for voice self-management. Based on the suggestions provided, six improvements were made to the prototype. the prototype Q-Voz Application was developed and evaluated by seven judges and subsequently improved. All evaluators stated they would use the application with their patients undergoing rehabilitation, indicating that the Q-Voz Application for mobile devices can be considered an auxiliary tool for voice speech therapy.

  13. Randomized controlled trial of supplemental augmentative and alternative communication versus voice rest alone after phonomicrosurgery.

    PubMed

    Rousseau, Bernard; Gutmann, Michelle L; Mau, Theodore; Francis, David O; Johnson, Jeffrey P; Novaleski, Carolyn K; Vinson, Kimberly N; Garrett, C Gaelyn

    2015-03-01

    This randomized trial investigated voice rest and supplemental text-to-speech communication versus voice rest alone on visual analog scale measures of communication effectiveness and magnitude of voice use. Randomized clinical trial. Multicenter outpatient voice clinics. Thirty-seven patients undergoing phonomicrosurgery. Patients undergoing phonomicrosurgery were randomized to voice rest and supplemental text-to-speech communication or voice rest alone. The primary outcome measure was the impact of voice rest on ability to communicate effectively over a 7-day period. Pre- and postoperative magnitude of voice use was also measured as an observational outcome. Patients randomized to voice rest and supplemental text-to-speech communication reported higher median communication effectiveness on each postoperative day compared to those randomized to voice rest alone, with significantly higher median communication effectiveness on postoperative days 3 (P=.03) and 5 (P=.01). Magnitude of voice use did not differ on any preoperative (P>.05) or postoperative day (P>.05), nor did patients significantly decrease voice use as the surgery date approached (P>.05). However, there was a significant reduction in median voice use pre- to postoperatively across patients (P<.001) with median voice use ranging from 0 to 3 throughout the postoperative week. Supplemental text-to-speech communication increased patient-perceived communication effectiveness on postoperative days 3 and 5 over voice rest alone. With the prevalence of smartphones and the widespread use of text messaging, supplemental text-to-speech communication may provide an accessible and cost-effective communication option for patients on vocal restrictions. © American Academy of Otolaryngology—Head and Neck Surgery Foundation 2015.

  14. Voice quality change in future professional voice users after 9 months of voice training.

    PubMed

    Timmermans, Bernadette; De Bodt, Marc; Wuyts, Floris; Van de Heyning, Paul

    2004-01-01

    Sixty-eight students of a school for audiovisual communication participated in this study. A part of them, 49 students, received voice training for 9 months (the trained group); 19 subjects received no specific voice training (the untrained group). A multidimensional test battery containing the GRBAS scale, videolaryngostroboscopy, Maximum Phonation Time (MPT), jitter, lowest intensity (IL), highest frequency (FoH), Dysphonia Severity Index (DSI) and Voice Handicap Index (VHI) was applied before and after training to evaluate training outcome. The voice training is made up of technical workshops in small groups (five to eight subjects) and vocal coaching in the ateliers. In the technical workshops, basic skills are trained (posture, breathing technique, articulation and diction), and in the ateliers, the speech and language pathologist assists the subjects in the practice of their voice work. This study revealed a significant amelioration over time for the objective measurements [Dysphonia Severity Index: from 2.3 to 4.5 ( P<0.001)] and the self-evaluation [Voice Handicap Index, from 23 to 18.4 ( P=0.016)] for the trained group only. This outcome favors the systematic introduction of voice training during the schooling of professional voice users.

  15. Speech and Communication Disorders

    MedlinePlus

    ... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...

  16. A posteriori error estimates in voice source recovery

    NASA Astrophysics Data System (ADS)

    Leonov, A. S.; Sorokin, V. N.

    2017-12-01

    The inverse problem of voice source pulse recovery from a segment of a speech signal is under consideration. A special mathematical model is used for the solution that relates these quantities. A variational method of solving inverse problem of voice source recovery for a new parametric class of sources, that is for piecewise-linear sources (PWL-sources), is proposed. Also, a technique for a posteriori numerical error estimation for obtained solutions is presented. A computer study of the adequacy of adopted speech production model with PWL-sources is performed in solving the inverse problems for various types of voice signals, as well as corresponding study of a posteriori error estimates. Numerical experiments for speech signals show satisfactory properties of proposed a posteriori error estimates, which represent the upper bounds of possible errors in solving the inverse problem. The estimate of the most probable error in determining the source-pulse shapes is about 7-8% for the investigated speech material. It is noted that a posteriori error estimates can be used as a criterion of the quality for obtained voice source pulses in application to speaker recognition.

  17. Towards Artificial Speech Therapy: A Neural System for Impaired Speech Segmentation.

    PubMed

    Iliya, Sunday; Neri, Ferrante

    2016-09-01

    This paper presents a neural system-based technique for segmenting short impaired speech utterances into silent, unvoiced, and voiced sections. Moreover, the proposed technique identifies those points of the (voiced) speech where the spectrum becomes steady. The resulting technique thus aims at detecting that limited section of the speech which contains the information about the potential impairment of the speech. This section is of interest to the speech therapist as it corresponds to the possibly incorrect movements of speech organs (lower lip and tongue with respect to the vocal tract). Two segmentation models to detect and identify the various sections of the disordered (impaired) speech signals have been developed and compared. The first makes use of a combination of four artificial neural networks. The second is based on a support vector machine (SVM). The SVM has been trained by means of an ad hoc nested algorithm whose outer layer is a metaheuristic while the inner layer is a convex optimization algorithm. Several metaheuristics have been tested and compared leading to the conclusion that some variants of the compact differential evolution (CDE) algorithm appears to be well-suited to address this problem. Numerical results show that the SVM model with a radial basis function is capable of effective detection of the portion of speech that is of interest to a therapist. The best performance has been achieved when the system is trained by the nested algorithm whose outer layer is hybrid-population-based/CDE. A population-based approach displays the best performance for the isolation of silence/noise sections, and the detection of unvoiced sections. On the other hand, a compact approach appears to be clearly well-suited to detect the beginning of the steady state of the voiced signal. Both the proposed segmentation models display outperformed two modern segmentation techniques based on Gaussian mixture model and deep learning.

  18. Segregation of Whispered Speech Interleaved with Noise or Speech Maskers

    DTIC Science & Technology

    2011-08-01

    range over which the talker can be heard. Whispered speech is produced by modulating the flow of air through partially open vocal folds. Because the...source of excitation is turbulent air flow , the acoustic characteristics of whispered speech differs from voiced speech [1, 2]. Despite the acoustic...signals provided by cochlear implants. Two studies investigated the segregation of simultaneously presented whispered vowels [7, 8] in a standard

  19. Reference-free automatic quality assessment of tracheoesophageal speech.

    PubMed

    Huang, Andy; Falk, Tiago H; Chan, Wai-Yip; Parsa, Vijay; Doyle, Philip

    2009-01-01

    Evaluation of the quality of tracheoesophageal (TE) speech using machines instead of human experts can enhance the voice rehabilitation process for patients who have undergone total laryngectomy and voice restoration. Towards the goal of devising a reference-free TE speech quality estimation algorithm, we investigate the efficacy of speech signal features that are used in standard telephone-speech quality assessment algorithms, in conjunction with a recently introduced speech modulation spectrum measure. Tests performed on two TE speech databases demonstrate that the modulation spectral measure and a subset of features in the standard ITU-T P.563 algorithm estimate TE speech quality with better correlation (up to 0.9) than previously proposed features.

  20. Randomized Controlled Trial of Supplemental Augmentative and Alternative Communication versus Voice Rest Alone after Phonomicrosurgery

    PubMed Central

    Rousseau, Bernard; Gutmann, Michelle L.; Mau, I-fan Theodore; Francis, David O.; Johnson, Jeffrey P.; Novaleski, Carolyn K.; Vinson, Kimberly N.; Garrett, C. Gaelyn

    2015-01-01

    Objective This randomized trial investigated voice rest and supplemental text-to-speech communication versus voice rest alone on visual analog scale measures of communication effectiveness and magnitude of voice use. Study Design Randomized clinical trial. Setting Multicenter outpatient voice clinics. Subjects Thirty-seven patients undergoing phonomicrosurgery. Methods Patients undergoing phonomicrosurgery were randomized to voice rest and supplemental text-to-speech communication or voice rest alone. The primary outcome measure was the impact of voice rest on ability to communicate effectively over a seven-day period. Pre- and post-operative magnitude of voice use was also measured as an observational outcome. Results Patients randomized to voice rest and supplemental text-to-speech communication reported higher median communication effectiveness on each post-operative day compared to those randomized to voice rest alone, with significantly higher median communication effectiveness on post-operative day 3 (p = 0.03) and 5 (p = 0.01). Magnitude of voice use did not differ on any pre-operative (p > 0.05) or post-operative day (p > 0.05), nor did patients significantly decrease voice use as the surgery date approached (p > 0.05). However, there was a significant reduction in median voice use pre- to post-operatively across patients (p < 0.001) with median voice use ranging from 0–3 throughout the post-operative week. Conclusion Supplemental text-to-speech communication increased patient perceived communication effectiveness on post-operative days 3 and 5 over voice rest alone. With the prevalence of smartphones and the widespread use of text messaging, supplemental text-to-speech communication may provide an accessible and cost-effective communication option for patients on vocal restrictions. PMID:25605690

  1. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  2. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  3. Implementation of the Intelligent Voice System for Kazakh

    NASA Astrophysics Data System (ADS)

    Yessenbayev, Zh; Saparkhojayev, N.; Tibeyev, T.

    2014-04-01

    Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.

  4. Multidimensional assessment of strongly irregular voices such as in substitution voicing and spasmodic dysphonia: a compilation of own research.

    PubMed

    Moerman, Mieke; Martens, Jean-Pierre; Dejonckere, Philippe

    2015-04-01

    This article is a compilation of own research performed during the European COoperation in Science and Technology (COST) action 2103: 'Advance Voice Function Assessment', an initiative of voice and speech processing teams consisting of physicists, engineers, and clinicians. This manuscript concerns analyzing largely irregular voicing types, namely substitution voicing (SV) and adductor spasmodic dysphonia (AdSD). A specific perceptual rating scale (IINFVo) was developed, and the Auditory Model Based Pitch Extractor (AMPEX), a piece of software that automatically analyses running speech and generates pitch values in background noise, was applied. The IINFVo perceptual rating scale has been shown to be useful in evaluating SV. The analysis of strongly irregular voices stimulated a modification of the European Laryngological Society's assessment protocol which was originally designed for the common types of (less severe) dysphonia. Acoustic analysis with AMPEX demonstrates that the most informative features are, for SV, the voicing-related acoustic features and, for AdSD, the perturbation measures. Poor correlations between self-assessment and acoustic and perceptual dimensions in the assessment of highly irregular voices argue for a multidimensional approach.

  5. The recognition of female voice based on voice registers in singing techniques in real-time using hankel transform method and macdonald function

    NASA Astrophysics Data System (ADS)

    Meiyanti, R.; Subandi, A.; Fuqara, N.; Budiman, M. A.; Siahaan, A. P. U.

    2018-03-01

    A singer doesn’t just recite the lyrics of a song, but also with the use of particular sound techniques to make it more beautiful. In the singing technique, more female have a diverse sound registers than male. There are so many registers of the human voice, but the voice registers used while singing, among others, Chest Voice, Head Voice, Falsetto, and Vocal fry. Research of speech recognition based on the female’s voice registers in singing technique is built using Borland Delphi 7.0. Speech recognition process performed by the input recorded voice samples and also in real time. Voice input will result in weight energy values based on calculations using Hankel Transformation method and Macdonald Functions. The results showed that the accuracy of the system depends on the accuracy of sound engineering that trained and tested, and obtained an average percentage of the successful introduction of the voice registers record reached 48.75 percent, while the average percentage of the successful introduction of the voice registers in real time to reach 57 percent.

  6. Cognitive Load in Voice Therapy Carry-Over Exercises.

    PubMed

    Iwarsson, Jenny; Morris, David Jackson; Balling, Laura Winther

    2017-01-01

    The cognitive load generated by online speech production may vary with the nature of the speech task. This article examines 3 speech tasks used in voice therapy carry-over exercises, in which a patient is required to adopt and automatize new voice behaviors, ultimately in daily spontaneous communication. Twelve subjects produced speech in 3 conditions: rote speech (weekdays), sentences in a set form, and semispontaneous speech. Subjects simultaneously performed a secondary visual discrimination task for which response times were measured. On completion of each speech task, subjects rated their experience on a questionnaire. Response times from the secondary, visual task were found to be shortest for the rote speech, longer for the semispontaneous speech, and longest for the sentences within the set framework. Principal components derived from the subjective ratings were found to be linked to response times on the secondary visual task. Acoustic measures reflecting fundamental frequency distribution and vocal fold compression varied across the speech tasks. The results indicate that consideration should be given to the selection of speech tasks during the process leading to automation of revised speech behavior and that self-reports may be a reliable index of cognitive load.

  7. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Holzrichter, J.F.; Ng, L.C.

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used formore » purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.« less

  8. The taste of music.

    PubMed

    Mesz, Bruno; Trevisan, Marcos A; Sigman, Mariano

    2011-01-01

    Zarlino, one of the most important music theorists of the XVI century, described the minor consonances as 'sweet' (dolci) and 'soft' (soavi) (Zarlino 1558/1983, in On the Modes New Haven, CT: Yale University Press, 1983). Hector Berlioz, in his Treatise on Modern Instrumentation and Orchestration (London: Novello, 1855), speaks about the 'small acid-sweet voice' of the oboe. In line with this tradition of describing musical concepts in terms of taste words, recent empirical studies have found reliable associations between taste perception and low-level sound and musical parameters, like pitch and phonetic features. Here we investigated whether taste words elicited consistent musical representations by asking trained musicians to improvise on the basis of the four canonical taste words: sweet, sour, bitter, and salty. Our results showed that, even in free improvisation, taste words elicited very reliable and consistent musical patterns:'bitter' improvisations are low-pitched and legato (without interruption between notes), 'salty' improvisations are staccato (notes sharply detached from each other), 'sour' improvisations are high-pitched and dissonant, and 'sweet' improvisations are consonant, slow, and soft. Interestingly, projections of the improvisations of taste words to musical space (a vector space defined by relevant musical parameters) revealed that, in musical space, improvisations based on different taste words were nearly orthogonal or opposite. Decoding methods could classify binary choices of improvisations (i.e., identify the improvisation word from the melody) at performance of around 80%--well above chance. In a second experiment we investigated the mapping from perception of music to taste words. Fifty-seven non-musical experts listened to a fraction of the improvisations. We found that listeners classified with high performance the taste word which had elicited the improvisation. Our results, furthermore, show that associations of taste and music

  9. Two-voice fundamental frequency estimation

    NASA Astrophysics Data System (ADS)

    de Cheveigné, Alain

    2002-05-01

    An algorithm is presented that estimates the fundamental frequencies of two concurrent voices or instruments. The algorithm models each voice as a periodic function of time, and jointly estimates both periods by cancellation according to a previously proposed method [de Cheveigné and Kawahara, Speech Commun. 27, 175-185 (1999)]. The new algorithm improves on the old in several respects; it allows an unrestricted search range, effectively avoids harmonic and subharmonic errors, is more accurate (it uses two-dimensional parabolic interpolation), and is computationally less costly. It remains subject to unavoidable errors when periods are in certain simple ratios and the task is inherently ambiguous. The algorithm is evaluated on a small database including speech, singing voice, and instrumental sounds. It can be extended in several ways; to decide the number of voices, to handle amplitude variations, and to estimate more than two voices (at the expense of increased processing cost and decreased reliability). It makes no use of instrument models, learned or otherwise, although it could usefully be combined with such models. [Work supported by the Cognitique programme of the French Ministry of Research and Technology.

  10. Measurement of voice onset time in maxillectomy patients.

    PubMed

    Hattori, Mariko; Sumita, Yuka I; Taniguchi, Hisashi

    2014-01-01

    Objective speech evaluation using acoustic measurement is needed for the proper rehabilitation of maxillectomy patients. For digital evaluation of consonants, measurement of voice onset time is one option. However, voice onset time has not been measured in maxillectomy patients as their consonant sound spectra exhibit unique characteristics that make the measurement of voice onset time challenging. In this study, we established criteria for measuring voice onset time in maxillectomy patients for objective speech evaluation. We examined voice onset time for /ka/ and /ta/ in 13 maxillectomy patients by calculating the number of valid measurements of voice onset time out of three trials for each syllable. Wilcoxon's signed rank test showed that voice onset time measurements were more successful for /ka/ and /ta/ when a prosthesis was used (Z = -2.232, P = 0.026 and Z = -2.401, P = 0.016, resp.) than when a prosthesis was not used. These results indicate a prosthesis affected voice onset measurement in these patients. Although more research in this area is needed, measurement of voice onset time has the potential to be used to evaluate consonant production in maxillectomy patients wearing a prosthesis.

  11. Measurement of Voice Onset Time in Maxillectomy Patients

    PubMed Central

    Hattori, Mariko; Sumita, Yuka I.; Taniguchi, Hisashi

    2014-01-01

    Objective speech evaluation using acoustic measurement is needed for the proper rehabilitation of maxillectomy patients. For digital evaluation of consonants, measurement of voice onset time is one option. However, voice onset time has not been measured in maxillectomy patients as their consonant sound spectra exhibit unique characteristics that make the measurement of voice onset time challenging. In this study, we established criteria for measuring voice onset time in maxillectomy patients for objective speech evaluation. We examined voice onset time for /ka/ and /ta/ in 13 maxillectomy patients by calculating the number of valid measurements of voice onset time out of three trials for each syllable. Wilcoxon's signed rank test showed that voice onset time measurements were more successful for /ka/ and /ta/ when a prosthesis was used (Z = −2.232, P = 0.026 and Z = −2.401, P = 0.016, resp.) than when a prosthesis was not used. These results indicate a prosthesis affected voice onset measurement in these patients. Although more research in this area is needed, measurement of voice onset time has the potential to be used to evaluate consonant production in maxillectomy patients wearing a prosthesis. PMID:24574934

  12. [Relevance of psychosocial factors in speech rehabilitation after laryngectomy].

    PubMed

    Singer, S; Fuchs, M; Dietz, A; Klemm, E; Kienast, U; Meyer, A; Oeken, J; Täschner, R; Wulke, C; Schwarz, R

    2007-12-01

    Often it is assumed that psychosocial and sociodemographic factors cause the success of voice rehabilitation after laryngectomy. Aim of this study was to analyze the association between these parameters. Based on tumor registries of six ENT-clinics all patients were surveyed, who were laryngectomized in the years before (N = 190). Success of voice rehabilitation has been assessed as speech intelligibility measured with the postlaryngectomy-telephone-intelligibility-test. For the assessment of the psychosocial parameters validated and standardized instruments were used if possible. Statistical analysis was done by multiple logistic regression analysis. Low speech intelligibility is associated with reduced conversations (OR 0.970) and social activity (OR 1.049). Patients are more likely to talk with esophageal voice when their motivation for learning the new voice was high (OR 7.835) and when they assessed their speech therapist as important for their motivation (OR 4.794). The risk to communicate merely by whispering is higher when patients live together with a partner (OR 5.293), when they talk seldomly (OR 1.017) and when they are not very active in social contexts (OR 0.966). Psychosocial factors can only partly explain how voice rehabilitation after laryngectomy becomes a success. Speech intelligibility is associated with active communication behaviour, whereas the use of an esophageal voice is correlated with motivation. It seems that the gaining of tracheoesophageal puncture voice is independent of psychosocial factors.

  13. Study of accent-based music speech protocol development for improving voice problems in stroke patients with mixed dysarthria.

    PubMed

    Kim, Soo Ji; Jo, Uiri

    2013-01-01

    Based on the anatomical and functional commonality between singing and speech, various types of musical elements have been employed in music therapy research for speech rehabilitation. This study was to develop an accent-based music speech protocol to address voice problems of stroke patients with mixed dysarthria. Subjects were 6 stroke patients with mixed dysarthria and they received individual music therapy sessions. Each session was conducted for 30 minutes and 12 sessions including pre- and post-test were administered for each patient. For examining the protocol efficacy, the measures of maximum phonation time (MPT), fundamental frequency (F0), average intensity (dB), jitter, shimmer, noise to harmonics ratio (NHR), and diadochokinesis (DDK) were compared between pre and post-test and analyzed with a paired sample t-test. The results showed that the measures of MPT, F0, dB, and sequential motion rates (SMR) were significantly increased after administering the protocol. Also, there were statistically significant differences in the measures of shimmer, and alternating motion rates (AMR) of the syllable /K$\\inve$/ between pre- and post-test. The results indicated that the accent-based music speech protocol may improve speech motor coordination including respiration, phonation, articulation, resonance, and prosody of patients with dysarthria. This suggests the possibility of utilizing the music speech protocol to maximize immediate treatment effects in the course of a long-term treatment for patients with dysarthria.

  14. Speech and swallowing disorders in Parkinson disease.

    PubMed

    Sapir, Shimon; Ramig, Lorraine; Fox, Cynthia

    2008-06-01

    To review recent research and clinical studies pertaining to the nature, diagnosis, and treatment of speech and swallowing disorders in Parkinson disease. Although some studies indicate improvement in voice and speech with dopamine therapy and deep brain stimulation of the subthalamic nucleus, others show minimal or adverse effects. Repetitive transcranial magnetic stimulation of the mouth motor cortex and injection of collagen in the vocal folds have preliminary data supporting improvement in phonation in people with Parkinson disease. Treatments focusing on vocal loudness, specifically LSVT LOUD (Lee Silverman Voice Treatment), have been effective for the treatment of speech disorders in Parkinson disease. Changes in brain activity due to LSVT LOUD provide preliminary evidence for neural plasticity. Computer-based technology makes the Lee Silverman Voice Treatment available to a large number of users. A rat model for studying neuropharmacologic effects on vocalization in Parkinson disease has been developed. New diagnostic methods of speech and swallowing are also available as the result of recent studies. Speech rehabilitation with the LSVT LOUD is highly efficacious and scientifically tested. There is a need for more studies to improve understanding, diagnosis, prevention, and treatment of speech and swallowing disorders in Parkinson disease.

  15. The value of visualizing tone of voice.

    PubMed

    Pullin, Graham; Cook, Andrew

    2013-10-01

    Whilst most of us have an innate feeling for tone of voice, it is an elusive quality that even phoneticians struggle to describe with sufficient subtlety. For people who cannot speak themselves this can have particularly profound repercussions. Augmentative communication often involves text-to-speech, a technology that only supports a basic choice of prosody based on punctuation. Given how inherently difficult it is to talk about more nuanced tone of voice, there is a risk that its absence from current devices goes unremarked and unchallenged. Looking ahead optimistically to more expressive communication aids, their design will need to involve more subtle interactions with tone of voice-interactions that the people using them can understand and engage with. Interaction design can play a role in making tone of voice visible, tangible, and accessible. Two projects that have already catalysed interdisciplinary debate in this area, Six Speaking Chairs and Speech Hedge, are introduced together with responses. A broader role for design is advocated, as a means to opening up speech technology research to a wider range of disciplinary perspectives, and also to the contributions and influence of people who use it in their everyday lives.

  16. Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction.

    PubMed

    Shao, Xu; Milner, Ben

    2005-08-01

    This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs) as may be encountered in a distributed speech recognition (DSR) system. Previous methods for speech reconstruction have required, in addition to the MFCC vectors, fundamental frequency and voicing components. In this work the voicing classification and fundamental frequency are predicted from the MFCC vectors themselves using two maximum a posteriori (MAP) methods. The first method enables fundamental frequency prediction by modeling the joint density of MFCCs and fundamental frequency using a single Gaussian mixture model (GMM). The second scheme uses a set of hidden Markov models (HMMs) to link together a set of state-dependent GMMs, which enables a more localized modeling of the joint density of MFCCs and fundamental frequency. Experimental results on speaker-independent male and female speech show that accurate voicing classification and fundamental frequency prediction is attained when compared to hand-corrected reference fundamental frequency measurements. The use of the predicted fundamental frequency and voicing for speech reconstruction is shown to give very similar speech quality to that obtained using the reference fundamental frequency and voicing.

  17. Analog voicing detector responds to pitch

    NASA Technical Reports Server (NTRS)

    Abel, R. S.; Watkins, H. E.

    1967-01-01

    Modified electronic voice encoder /Vocoder/ includes an independent analog mode of operation in addition to the conventional digital mode. The Vocoder is a bandwidth compression equipment that permits voice transmission over channels, having only a fraction of the bandwidth required for conventional telephone-quality speech transmission.

  18. Voice Based City Panic Button System

    NASA Astrophysics Data System (ADS)

    Febriansyah; Zainuddin, Zahir; Bachtiar Nappu, M.

    2018-03-01

    The development of voice activated panic button application aims to design faster early notification of hazardous condition in community to the nearest police by using speech as the detector where the current application still applies touch-combination on screen and use coordination of orders from control center then the early notification still takes longer time. The method used in this research was by using voice recognition as the user voice detection and haversine formula for the comparison of closest distance between the user and the police. This research was equipped with auto sms, which sent notification to the victim’s relatives, that was also integrated with Google Maps application (GMaps) as the map to the victim’s location. The results show that voice registration on the application reaches 100%, incident detection using speech recognition while the application is running is 94.67% in average, and the auto sms to the victim relatives reaches 100%.

  19. Automatic initial and final segmentation in cleft palate speech of Mandarin speakers.

    PubMed

    He, Ling; Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang

    2017-01-01

    The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with "quasi-unvoiced" or with "quasi-voiced" initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%.

  20. Applications of orofacial myofunctional techniques to speech therapy.

    PubMed

    Landis, C F

    1994-11-01

    A speech-language pathologist describes how she uses oral myofunctional therapy techniques in the treatment of speech articulation disorders, voice disorders, stuttering and apraxia of speech. Specific exercises are detailed.

  1. Voice Conversion Using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-Sampling

    NASA Astrophysics Data System (ADS)

    Mousa, Allam

    2010-01-01

    Voice changing has many applications in the industry and commercial filed. This paper emphasizes voice conversion using a pitch shifting method which depends on detecting the pitch of the signal (fundamental frequency) using Simplified Inverse Filter Tracking (SIFT) and changing it according to the target pitch period using time stretching with Pitch Synchronous Over Lap Add Algorithm (PSOLA), then resampling the signal in order to have the same play rate. The same study was performed to see the effect of voice conversion when some Arabic speech signal is considered. Treatment of certain Arabic voiced vowels and the conversion between male and female speech has shown some expansion or compression in the resulting speech. Comparison in terms of pitch shifting is presented here. Analysis was performed for a single frame and a full segmentation of speech.

  2. Processing of speech signals for physical and sensory disabilities.

    PubMed Central

    Levitt, H

    1995-01-01

    Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities. Images Fig. 4 PMID:7479816

  3. Processing of Speech Signals for Physical and Sensory Disabilities

    NASA Astrophysics Data System (ADS)

    Levitt, Harry

    1995-10-01

    Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.

  4. Fluid-Structure Interactions as Flow Propagates Tangentially Over a Flexible Plate with Application to Voiced Speech Production

    NASA Astrophysics Data System (ADS)

    Westervelt, Andrea; Erath, Byron

    2013-11-01

    Voiced speech is produced by fluid-structure interactions that drive vocal fold motion. Viscous flow features influence the pressure in the gap between the vocal folds (i.e. glottis), thereby altering vocal fold dynamics and the sound that is produced. During the closing phases of the phonatory cycle, vortices form as a result of flow separation as air passes through the divergent glottis. It is hypothesized that the reduced pressure within a vortex core will alter the pressure distribution along the vocal fold surface, thereby aiding in vocal fold closure. The objective of this study is to determine the impact of intraglottal vortices on the fluid-structure interactions of voiced speech by investigating how the dynamics of a flexible plate are influenced by a vortex ring passing tangentially over it. A flexible plate, which models the medial vocal fold surface, is placed in a water-filled tank and positioned parallel to the exit of a vortex generator. The physical parameters of plate stiffness and vortex circulation are scaled with physiological values. As vortices propagate over the plate, particle image velocimetry measurements are captured to analyze the energy exchange between the fluid and flexible plate. The investigations are performed over a range of vortex formation numbers, and lateral displacements of the plate from the centerline of the vortex trajectory. Observations show plate oscillations with displacements directly correlated with the vortex core location.

  5. Acoustic characteristics of voice after severe traumatic brain injury.

    PubMed

    McHenry, M

    2000-07-01

    To describe the acoustic characteristics of voice in individuals with motor speech disorders after traumatic brain injury (TBI). Prospective study of 100 individuals with TBI based on consecutive referrals for motor speech evaluations. Subjects were audio tape-recorded while producing sustained vowels and single word and sentence intelligibility tests. Laryngeal airway resistance was estimated, and voice quality was rated perceptually. None of the subjects evidenced vocal parameters within normal limits. The most frequently occurring abnormal parameter across subjects was amplitude perturbation, followed by voice turbulence index. Twenty-three percent of subjects evidenced deviation in all five parameters measured. The perceptual ratings of breathiness were significantly correlated with both the amplitude perturbation quotient and the noise-to-harmonics ratio. Vocal quality deviation is common in motor speech disorders after TBI and may impact intelligibility.

  6. Biphonation in voice signals

    NASA Astrophysics Data System (ADS)

    Herzel, Hanspeter; Reuter, Robert

    1996-06-01

    Irregularities in voiced speech are often observed as a consequence of vocal fold lesions, paralyses, and other pathological conditions. Many of these instabilities are related to the intrinsic nonlinearities in the vibrations of the vocal folds. In this paper, a specific nonlinear phenomenon is discussed: The appearance of two independent fundamental frequencies termed biphonation. Several narrow-band spectrograms are presented showing biphonation in signals from voice patients, a newborn cry, a singer, and excised larynx experiments. Finally, possible physiological mechanisms of instabilities of the voice source are discussed.

  7. Fluid Dynamics of Human Phonation and Speech

    NASA Astrophysics Data System (ADS)

    Mittal, Rajat; Erath, Byron D.; Plesniak, Michael W.

    2013-01-01

    This article presents a review of the fluid dynamics, flow-structure interactions, and acoustics associated with human phonation and speech. Our voice is produced through the process of phonation in the larynx, and an improved understanding of the underlying physics of this process is essential to advancing the treatment of voice disorders. Insights into the physics of phonation and speech can also contribute to improved vocal training and the development of new speech compression and synthesis schemes. This article introduces the key biomechanical features of the laryngeal physiology, reviews the basic principles of voice production, and summarizes the progress made over the past half-century in understanding the flow physics of phonation and speech. Laryngeal pathologies, which significantly enhance the complexity of phonatory dynamics, are discussed. After a thorough examination of the state of the art in computational modeling and experimental investigations of phonatory biomechanics, we present a synopsis of the pacing issues in this arena and an outlook for research in this fascinating subject.

  8. Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

    PubMed Central

    Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin

    2016-01-01

    Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714

  9. Voice Modulations in German Ironic Speech

    ERIC Educational Resources Information Center

    Scharrer, Lisa; Christmann, Ursula; Knoll, Monja

    2011-01-01

    Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech, and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German "ironic…

  10. Vibrato in Singing Voice: The Link between Source-Filter and Sinusoidal Models

    NASA Astrophysics Data System (ADS)

    Arroabarren, Ixone; Carlosena, Alfonso

    2004-12-01

    The application of inverse filtering techniques for high-quality singing voice analysis/synthesis is discussed. In the context of source-filter models, inverse filtering provides a noninvasive method to extract the voice source, and thus to study voice quality. Although this approach is widely used in speech synthesis, this is not the case in singing voice. Several studies have proved that inverse filtering techniques fail in the case of singing voice, the reasons being unclear. In order to shed light on this problem, we will consider here an additional feature of singing voice, not present in speech: the vibrato. Vibrato has been traditionally studied by sinusoidal modeling. As an alternative, we will introduce here a novel noninteractive source filter model that incorporates the mechanisms of vibrato generation. This model will also allow the comparison of the results produced by inverse filtering techniques and by sinusoidal modeling, as they apply to singing voice and not to speech. In this way, the limitations of these conventional techniques, described in previous literature, will be explained. Both synthetic signals and singer recordings are used to validate and compare the techniques presented in the paper.

  11. Systematic studies of modified vocalization: effects of speech rate and instatement style during metronome stimulation.

    PubMed

    Davidow, Jason H; Bothe, Anne K; Richardson, Jessica D; Andreatta, Richard D

    2010-12-01

    This study introduces a series of systematic investigations intended to clarify the parameters of the fluency-inducing conditions (FICs) in stuttering. Participants included 11 adults, aged 20-63 years, with typical speech-production skills. A repeated measures design was used to examine the relationships between several speech production variables (vowel duration, voice onset time, fundamental frequency, intraoral pressure, pressure rise time, transglottal airflow, and phonated intervals) and speech rate and instatement style during metronome-entrained rhythmic speech. Measures of duration (vowel duration, voice onset time, and pressure rise time) differed across different metronome conditions. When speech rates were matched between the control condition and metronome condition, voice onset time was the only variable that changed. Results confirm that speech rate and instatement style can influence speech production variables during the production of fluency-inducing conditions. Future studies of normally fluent speech and of stuttered speech must control both features and should further explore the importance of voice onset time, which may be influenced by rate during metronome stimulation in a way that the other variables are not.

  12. Voice Disorders in School Children: Clinical Management.

    ERIC Educational Resources Information Center

    Garbee, Frederick E., Ed.

    Five papers presented at two inservice institutes for school speech and language pathologists delineated identification, remediation, and management of voice disorders in school children. Keynote remarks emphasized the intimate relationship between children's voices and their affective behavior and psychological needs, and thus, the importance of…

  13. Recognizing emotional speech in Persian: a validated database of Persian emotional speech (Persian ESD).

    PubMed

    Keshtiari, Niloofar; Kuhlmann, Michael; Eslami, Moharram; Klann-Delius, Gisela

    2015-03-01

    Research on emotional speech often requires valid stimuli for assessing perceived emotion through prosody and lexical content. To date, no comprehensive emotional speech database for Persian is officially available. The present article reports the process of designing, compiling, and evaluating a comprehensive emotional speech database for colloquial Persian. The database contains a set of 90 validated novel Persian sentences classified in five basic emotional categories (anger, disgust, fear, happiness, and sadness), as well as a neutral category. These sentences were validated in two experiments by a group of 1,126 native Persian speakers. The sentences were articulated by two native Persian speakers (one male, one female) in three conditions: (1) congruent (emotional lexical content articulated in a congruent emotional voice), (2) incongruent (neutral sentences articulated in an emotional voice), and (3) baseline (all emotional and neutral sentences articulated in neutral voice). The speech materials comprise about 470 sentences. The validity of the database was evaluated by a group of 34 native speakers in a perception test. Utterances recognized better than five times chance performance (71.4 %) were regarded as valid portrayals of the target emotions. Acoustic analysis of the valid emotional utterances revealed differences in pitch, intensity, and duration, attributes that may help listeners to correctly classify the intended emotion. The database is designed to be used as a reliable material source (for both text and speech) in future cross-cultural or cross-linguistic studies of emotional speech, and it is available for academic research purposes free of charge. To access the database, please contact the first author.

  14. Constructing Adequate Non-Speech Analogues: What Is Special about Speech Anyway?

    ERIC Educational Resources Information Center

    Rosen, Stuart; Iverson, Paul

    2007-01-01

    Vouloumanos and Werker (2007) claim that human neonates have a (possibly innate) bias to listen to speech based on a preference for natural speech utterances over sine-wave analogues. We argue that this bias more likely arises from the strikingly different saliency of voice melody in the two kinds of sounds, a bias that has already been shown to…

  15. Aspects of the speaking voice of elderly women with choral singing experience.

    PubMed

    Aquino, Fernanda Salvatico de; Silva, Marta Assumpção Andrada E; Teles, Lídia Cristina da Silva; Ferreira, Léslie Piccolotto

    2016-01-01

    Despite several studies related to singing and aging voice found in the literature, there is still the need for investigation seeking to understand the effects of this practice in the speaking voice of the elderly. To compare the characteristics of the speaking voice of elderlies with experience in choral singing with those of elderlies without this experience. Participants were 75 elderly women: 50 with experience in choral singing - group of singers (SG) and 25 without experience - group of nonsingers (NSG). A questionnaire was applied to characterize the elderly and collect data with respect to lifestyle and voice. Speech samples (sustained vowels, repetition of sentences, and running speech excerpts) were collected in a quiet room in sitting position. The voices were analyzed by three expert speech-language pathologists according to the protocol Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). Data were submitted to descriptive and statistical analysis. The voices of elderly nonsingers (NSG) showed significant increase in scores related to the overall degree of deviance and presence of roughness and strain. Analysis of the aspects of the speaking voice of subjects in the SG, compared with that of subjects in the NSG, showed better overall degree of deviance due to lower roughness and strain.

  16. A Review of Training Opportunities for Singing Voice Rehabilitation Specialists.

    PubMed

    Gerhard, Julia

    2016-05-01

    Training opportunities for singing voice rehabilitation specialists are growing and changing. This is happening despite a lack of agreed-on guidelines or an accredited certification acknowledged by the governing bodies in the fields of speech-language pathology and vocal pedagogy, the American Speech-Language Hearing Association and the National Association of Teachers of Singing, respectively. The roles of the speech-language pathologist, the singing teacher, and the person who bridges this gap, the singing voice rehabilitation specialist, are now becoming better defined and more common among the voice care community. To that end, this article aims to review the current opportunities for training in the field of singing voice rehabilitation. A review of available university training programs, private training programs and mentorships, clinical fellowships, professional organizations, conferences, vocal training across genres, and self-study opportunities was conducted. All institutional listings are with permission from program leaders. Although many avenues are available for training of singing voice rehabilitation specialists, there is no accredited comprehensive training program at this point. This review gathers information on current training opportunities from across various modalities. The listings are not intended to be comprehensive but rather representative of possibilities for interested practitioners. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  17. Paving the Way for Speech: Voice-Training-Induced Plasticity in Chronic Aphasia and Apraxia of Speech—Three Single Cases

    PubMed Central

    Jungblut, Monika; Huber, Walter; Mais, Christiane

    2014-01-01

    Difficulties with temporal coordination or sequencing of speech movements are frequently reported in aphasia patients with concomitant apraxia of speech (AOS). Our major objective was to investigate the effects of specific rhythmic-melodic voice training on brain activation of those patients. Three patients with severe chronic nonfluent aphasia and AOS were included in this study. Before and after therapy, patients underwent the same fMRI procedure as 30 healthy control subjects in our prestudy, which investigated the neural substrates of sung vowel changes in untrained rhythm sequences. A main finding was that post-minus pretreatment imaging data yielded significant perilesional activations in all patients for example, in the left superior temporal gyrus, whereas the reverse subtraction revealed either no significant activation or right hemisphere activation. Likewise, pre- and posttreatment assessments of patients' vocal rhythm production, language, and speech motor performance yielded significant improvements for all patients. Our results suggest that changes in brain activation due to the applied training might indicate specific processes of reorganization, for example, improved temporal sequencing of sublexical speech components. In this context, a training that focuses on rhythmic singing with differently demanding complexity levels as concerns motor and cognitive capabilities seems to support paving the way for speech. PMID:24977055

  18. Remote Capture of Human Voice Acoustical Data by Telephone: A Methods Study

    ERIC Educational Resources Information Center

    Cannizzaro, Michael S.; Reilly, Nicole; Mundt, James C.; Snyder, Peter J.

    2005-01-01

    In this pilot study we sought to determine the reliability and validity of collecting speech and voice acoustical data via telephone transmission for possible future use in large clinical trials. Simultaneous recordings of each participant's speech and voice were made at the point of participation, the local recording (LR), and over a telephone…

  19. Tracheostomy cannulas and voice prosthesis

    PubMed Central

    Kramp, Burkhard; Dommerich, Steffen

    2011-01-01

    Cannulas and voice prostheses are mechanical aids for patients who had to undergo tracheotomy or laryngectomy for different reasons. For better understanding of the function of those artificial devices, first the indications and particularities of the previous surgical intervention are described in the context of this review. Despite the established procedure of percutaneous dilatation tracheotomy e.g. in intensive care units, the application of epithelised tracheostomas has its own position, especially when airway obstruction is persistent (e.g. caused by traumata, inflammations, or tumors) and a longer artificial ventilation or special care of the patient are required. In order to keep the airways open after tracheotomy, tracheostomy cannulas of different materials with different functions are available. For each patient the most appropriate type of cannula must be found. Voice prostheses are meanwhile the device of choice for rapid and efficient voice rehabilitation after laryngectomy. Individual sizes and materials allow adaptation of the voice prostheses to the individual anatomical situation of the patients. The combined application of voice prostheses with HME (Head and Moisture Exchanger) allows a good vocal as well as pulmonary rehabilitation. Precondition for efficient voice prosthesis is the observation of certain surgical principles during laryngectomy. The duration of the prosthesis mainly depends on material properties and biofilms, mostly consisting of funguses and bacteries. The quality of voice with valve prosthesis is clearly superior to esophagus prosthesis or electro-laryngeal voice. Whenever possible, tracheostoma valves for free-hand speech should be applied. Physicians taking care of patients with speech prostheses after laryngectomy should know exactly what to do in case the device fails or gets lost. PMID:22073098

  20. Tracheostomy cannulas and voice prosthesis.

    PubMed

    Kramp, Burkhard; Dommerich, Steffen

    2009-01-01

    Cannulas and voice prostheses are mechanical aids for patients who had to undergo tracheotomy or laryngectomy for different reasons. For better understanding of the function of those artificial devices, first the indications and particularities of the previous surgical intervention are described in the context of this review. Despite the established procedure of percutaneous dilatation tracheotomy e.g. in intensive care units, the application of epithelised tracheostomas has its own position, especially when airway obstruction is persistent (e.g. caused by traumata, inflammations, or tumors) and a longer artificial ventilation or special care of the patient are required. In order to keep the airways open after tracheotomy, tracheostomy cannulas of different materials with different functions are available. For each patient the most appropriate type of cannula must be found. Voice prostheses are meanwhile the device of choice for rapid and efficient voice rehabilitation after laryngectomy. Individual sizes and materials allow adaptation of the voice prostheses to the individual anatomical situation of the patients. The combined application of voice prostheses with HME (Head and Moisture Exchanger) allows a good vocal as well as pulmonary rehabilitation. Precondition for efficient voice prosthesis is the observation of certain surgical principles during laryngectomy. The duration of the prosthesis mainly depends on material properties and biofilms, mostly consisting of funguses and bacteries. The quality of voice with valve prosthesis is clearly superior to esophagus prosthesis or electro-laryngeal voice. Whenever possible, tracheostoma valves for free-hand speech should be applied. Physicians taking care of patients with speech prostheses after laryngectomy should know exactly what to do in case the device fails or gets lost.

  1. DLMS Voice Data Entry.

    DTIC Science & Technology

    1980-06-01

    34 LIST OF ILLUSTRATIONS FIGURE PAGE 1 Block Diagram of DLMS Voice Recognition System .............. S 2 Flowchart of DefaulV...particular are a speech preprocessor and a minicomputer. In the VRS, as shown in the block diagram of Fig. 1, the preprocessor is a TTI model 8040 and...Data General 6026 Magnetic Zo 4 Tape Unit Display L-- - Equipment Cabinet Fig. 1 block Diagram of DIMS Voice Recognition System qS 2. Flexible Disk

  2. PRODUCTION OF SOUND BY UNSTEADY THROTTLING OF FLOW INTO A RESONANT CAVITY, WITH APPLICATION TO VOICED SPEECH

    PubMed Central

    Howe, M. S.; McGowan, R. S.

    2011-01-01

    An analysis is made of the sound generated by the time-dependent throttling of a nominally steady stream of air through a small orifice into a flow-through resonant cavity. This is exemplified by the production of voiced speech, where air from the lungs enters the vocal tract through the glottis at a time variable volume flow rate Q(t) controlled by oscillations of the glottis cross-section. Voicing theory has hitherto determined Q from a heuristic, reduced complexity ‘Fant’ differential equation (G. Fant, Acoustic Theory of Speech Production, 1960). A new self-consistent, integro-differential form of this equation is derived in this paper using the theory of aerodynamic sound, with full account taken of the back-reaction of the resonant tract on the glottal flux Q. The theory involves an aeroacoustic Green’s function (G) for flow-surface interactions in a time-dependent glottis, so making the problem non-self-adjoint. In complex problems of this type it is not usually possible to obtain G in an explicit analytic form. The principal objective of the paper is to show how the Fant equation can still be derived in such cases from a consideration of the equation of aerodynamic sound and from the adjoint of the equation governing G in the neighbourhood of the ‘throttle’. The theory is illustrated by application to the canonical problem of throttled flow into a Helmholtz resonator. PMID:21666824

  3. Systematic studies of modified vocalization: the effect of speech rate on speech production measures during metronome-paced speech in persons who stutter.

    PubMed

    Davidow, Jason H

    2014-01-01

    Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control speech rate between conditions limits our ability to determine if the changes were necessary for fluency. This study examined the effect of speech rate on several speech production variables during one-syllable-per-beat metronomic speech in order to determine changes that may be important for fluency during this fluency-inducing condition. Thirteen persons who stutter (PWS), aged 18-62 years, completed a series of speaking tasks. Several speech production variables were compared between conditions produced at different metronome beat rates, and between a control condition and a metronome-paced speech condition produced at a rate equal to the control condition. Vowel duration, voice onset time, pressure rise time and phonated intervals were significantly impacted by metronome beat rate. Voice onset time and the percentage of short (30-100 ms) phonated intervals significantly decreased from the control condition to the equivalent rate metronome-paced speech condition. A reduction in the percentage of short phonated intervals may be important for fluency during syllable-based metronome-paced speech for PWS. Future studies should continue examining the necessity of this reduction. In addition, speech rate must be controlled in future fluency-inducing condition studies, including neuroimaging investigations, in order for this research to make a substantial contribution to finding the fluency-inducing mechanism of fluency-inducing conditions. © 2013 Royal College of Speech and Language Therapists.

  4. A Cognitive Neuroscience View of Voice-Processing Abnormalities in Schizophrenia: A Window into Auditory Verbal Hallucinations?

    PubMed

    Conde, Tatiana; Gonçalves, Oscar F; Pinheiro, Ana P

    2016-01-01

    Auditory verbal hallucinations (AVH) are a core symptom of schizophrenia. Like "real" voices, AVH carry a rich amount of linguistic and paralinguistic cues that convey not only speech, but also affect and identity, information. Disturbed processing of voice identity, affective, and speech information has been reported in patients with schizophrenia. More recent evidence has suggested a link between voice-processing abnormalities and specific clinical symptoms of schizophrenia, especially AVH. It is still not well understood, however, to what extent these dimensions are impaired and how abnormalities in these processes might contribute to AVH. In this review, we consider behavioral, neuroimaging, and electrophysiological data to investigate the speech, identity, and affective dimensions of voice processing in schizophrenia, and we discuss how abnormalities in these processes might help to elucidate the mechanisms underlying specific phenomenological features of AVH. Schizophrenia patients exhibit behavioral and neural disturbances in the three dimensions of voice processing. Evidence suggesting a role of dysfunctional voice processing in AVH seems to be stronger for the identity and speech dimensions than for the affective domain.

  5. Describing Speech Usage in Daily Activities in Typical Adults.

    PubMed

    Anderson, Laine; Baylor, Carolyn R; Eadie, Tanya L; Yorkston, Kathryn M

    2016-01-01

    "Speech usage" refers to what people want or need to do with their speech to meet communication demands in life roles. The purpose of this study was to contribute to validation of the Levels of Speech Usage scale by providing descriptive data from a sample of adults without communication disorders, comparing this scale to a published Occupational Voice Demands scale and examining predictors of speech usage levels. This is a survey design. Adults aged ≥25 years without reported communication disorders were recruited nationally to complete an online questionnaire. The questionnaire included the Levels of Speech Usage scale, questions about relevant occupational and nonoccupational activities (eg, socializing, hobbies, childcare, and so forth), and demographic information. Participants were also categorized according to Koufman and Isaacson occupational voice demands scale. A total of 276 participants completed the questionnaires. People who worked for pay tended to report higher levels of speech usage than those who do not work for pay. Regression analyses showed employment to be the major contributor to speech usage; however, considerable variance left unaccounted for suggests that determinants of speech usage and the relationship between speech usage, employment, and other life activities are not yet fully defined. The Levels of Speech Usage may be a viable instrument to systematically rate speech usage because it captures both occupational and nonoccupational speech demands. These data from a sample of typical adults may provide a reference to help in interpreting the impact of communication disorders on speech usage patterns. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  6. Synthesized speech rate and pitch effects on intelligibility of warning messages for pilots

    NASA Technical Reports Server (NTRS)

    Simpson, C. A.; Marchionda-Frost, K.

    1984-01-01

    In civilian and military operations, a future threat-warning system with a voice display could warn pilots of other traffic, obstacles in the flight path, and/or terrain during low-altitude helicopter flights. The present study was conducted to learn whether speech rate and voice pitch of phoneme-synthesized speech affects pilot accuracy and response time to typical threat-warning messages. Helicopter pilots engaged in an attention-demanding flying task and listened for voice threat warnings presented in a background of simulated helicopter cockpit noise. Performance was measured by flying-task performance, threat-warning intelligibility, and response time. Pilot ratings were elicited for the different voice pitches and speech rates. Significant effects were obtained only for response time and for pilot ratings, both as a function of speech rate. For the few cases when pilots forgot to respond to a voice message, they remembered 90 percent of the messages accurately when queried for their response 8 to 10 sec later.

  7. Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery.

    PubMed

    Fu, Szu-Wei; Li, Pei-Chun; Lai, Ying-Hui; Yang, Cheng-Chien; Hsieh, Li-Chun; Tsao, Yu

    2017-11-01

    Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients. Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient

  8. Intensive Voice Treatment (LSVT[R]LOUD) for Parkinson's Disease Following Deep Brain Stimulation of the Subthalamic Nucleus

    ERIC Educational Resources Information Center

    Spielman, Jennifer; Mahler, Leslie; Halpern, Angela; Gilley, Phllip; Klepitskaya, Olga; Ramig, Lorraine

    2011-01-01

    Purpose: Intensive voice therapy (LSVT[R]LOUD) can effectively manage voice and speech symptoms associated with idiopathic Parkinson disease (PD). This small-group study evaluated voice and speech in individuals with and without deep brain stimulation of the subthalamic nucleus (STN-DBS) before and after LSVT LOUD, to determine whether outcomes…

  9. Acoustic Measures of Voice and Physiologic Measures of Autonomic Arousal during Speech as a Function of Cognitive Load.

    PubMed

    MacPherson, Megan K; Abur, Defne; Stepp, Cara E

    2017-07-01

    This study aimed to determine the relationship among cognitive load condition and measures of autonomic arousal and voice production in healthy adults. A prospective study design was conducted. Sixteen healthy young adults (eight men, eight women) produced a sentence containing an embedded Stroop task in each of two cognitive load conditions: congruent and incongruent. In both conditions, participants said the font color of the color words instead of the word text. In the incongruent condition, font color differed from the word text, creating an increase in cognitive load relative to the congruent condition in which font color and word text matched. Three physiologic measures of autonomic arousal (pulse volume amplitude, pulse period, and skin conductance response amplitude) and four acoustic measures of voice (sound pressure level, fundamental frequency, cepstral peak prominence, and low-to-high spectral energy ratio) were analyzed for eight sentence productions in each cognitive load condition per participant. A logistic regression model was constructed to predict the cognitive load condition (congruent or incongruent) using subject as a categorical predictor and the three autonomic measures and four acoustic measures as continuous predictors. It revealed that skin conductance response amplitude, cepstral peak prominence, and low-to-high spectral energy ratio were significantly associated with cognitive load condition. During speech produced under increased cognitive load, healthy young adults show changes in physiologic markers of heightened autonomic arousal and acoustic measures of voice quality. Future work is necessary to examine these measures in older adults and individuals with voice disorders. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  10. Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures During Metronome-Paced Speech in Persons who Stutter

    PubMed Central

    Davidow, Jason H.

    2013-01-01

    Background Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control speech rate between conditions limits our ability to determine if the changes were necessary for fluency. Aims This study examined the effect of speech rate on several speech production variables during one-syllable-per-beat metronomic speech, in order to determine changes that may be important for fluency during this fluency-inducing condition. Methods and Procedures Thirteen persons who stutter (PWS), aged 18–62 years, completed a series of speaking tasks. Several speech production variables were compared between conditions produced at different metronome beat rates, and between a control condition and a metronome-paced speech condition produced at a rate equal to the control condition. Outcomes & Results Vowel duration, voice onset time, pressure rise time, and phonated intervals were significantly impacted by metronome beat rate. Voice onset time and the percentage of short (30–100 ms) phonated intervals significantly decreased from the control condition to the equivalent rate metronome-paced speech condition. Conclusions & Implications A reduction in the percentage of short phonated intervals may be important for fluency during syllable-based metronome-paced speech for PWS. Future studies should continue examining the necessity of this reduction. In addition, speech rate must be controlled in future fluency-inducing condition studies, including neuroimaging investigations, in order for this research to make a substantial contribution to finding the fluency-inducing mechanism of fluency-inducing conditions. PMID:24372888

  11. The Hypothesis of Apraxia of Speech in Children with Autism Spectrum Disorder

    PubMed Central

    Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.

    2010-01-01

    In a sample of 46 children aged 4 to 7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants’ speech, prosody, and voice were compared with data from 40 typically-developing children, 13 preschool children with Speech Delay, and 15 participants aged 5 to 49 years with CAS in neurogenetic disorders. Speech Delay and Speech Errors, respectively, were modestly and substantially more prevalent in participants with ASD than reported population estimates. Double dissociations in speech, prosody, and voice impairments in ASD were interpreted as consistent with a speech attunement framework, rather than with the motor speech impairments that define CAS. Key Words: apraxia, dyspraxia, motor speech disorder, speech sound disorder PMID:20972615

  12. Pathological speech signal analysis and classification using empirical mode decomposition.

    PubMed

    Kaleem, Muhammad; Ghoraani, Behnaz; Guergachi, Aziz; Krishnan, Sridhar

    2013-07-01

    Automated classification of normal and pathological speech signals can provide an objective and accurate mechanism for pathological speech diagnosis, and is an active area of research. A large part of this research is based on analysis of acoustic measures extracted from sustained vowels. However, sustained vowels do not reflect real-world attributes of voice as effectively as continuous speech, which can take into account important attributes of speech such as rapid voice onset and termination, changes in voice frequency and amplitude, and sudden discontinuities in speech. This paper presents a methodology based on empirical mode decomposition (EMD) for classification of continuous normal and pathological speech signals obtained from a well-known database. EMD is used to decompose randomly chosen portions of speech signals into intrinsic mode functions, which are then analyzed to extract meaningful temporal and spectral features, including true instantaneous features which can capture discriminative information in signals hidden at local time-scales. A total of six features are extracted, and a linear classifier is used with the feature vector to classify continuous speech portions obtained from a database consisting of 51 normal and 161 pathological speakers. A classification accuracy of 95.7 % is obtained, thus demonstrating the effectiveness of the methodology.

  13. Swinging at a cocktail party: voice familiarity aids speech perception in the presence of a competing voice.

    PubMed

    Johnsrude, Ingrid S; Mackey, Allison; Hakyemez, Hélène; Alexander, Elizabeth; Trang, Heather P; Carlyon, Robert P

    2013-10-01

    People often have to listen to someone speak in the presence of competing voices. Much is known about the acoustic cues used to overcome this challenge, but almost nothing is known about the utility of cues derived from experience with particular voices--cues that may be particularly important for older people and others with impaired hearing. Here, we use a version of the coordinate-response-measure procedure to show that people can exploit knowledge of a highly familiar voice (their spouse's) not only to track it better in the presence of an interfering stranger's voice, but also, crucially, to ignore it so as to comprehend a stranger's voice more effectively. Although performance declines with increasing age when the target voice is novel, there is no decline when the target voice belongs to the listener's spouse. This finding indicates that older listeners can exploit their familiarity with a speaker's voice to mitigate the effects of sensory and cognitive decline.

  14. Mechanics of human voice production and control

    PubMed Central

    Zhang, Zhaoyan

    2016-01-01

    As the primary means of communication, voice plays an important role in daily life. Voice also conveys personal information such as social status, personal traits, and the emotional state of the speaker. Mechanically, voice production involves complex fluid-structure interaction within the glottis and its control by laryngeal muscle activation. An important goal of voice research is to establish a causal theory linking voice physiology and biomechanics to how speakers use and control voice to communicate meaning and personal information. Establishing such a causal theory has important implications for clinical voice management, voice training, and many speech technology applications. This paper provides a review of voice physiology and biomechanics, the physics of vocal fold vibration and sound production, and laryngeal muscular control of the fundamental frequency of voice, vocal intensity, and voice quality. Current efforts to develop mechanical and computational models of voice production are also critically reviewed. Finally, issues and future challenges in developing a causal theory of voice production and perception are discussed. PMID:27794319

  15. Mechanics of human voice production and control.

    PubMed

    Zhang, Zhaoyan

    2016-10-01

    As the primary means of communication, voice plays an important role in daily life. Voice also conveys personal information such as social status, personal traits, and the emotional state of the speaker. Mechanically, voice production involves complex fluid-structure interaction within the glottis and its control by laryngeal muscle activation. An important goal of voice research is to establish a causal theory linking voice physiology and biomechanics to how speakers use and control voice to communicate meaning and personal information. Establishing such a causal theory has important implications for clinical voice management, voice training, and many speech technology applications. This paper provides a review of voice physiology and biomechanics, the physics of vocal fold vibration and sound production, and laryngeal muscular control of the fundamental frequency of voice, vocal intensity, and voice quality. Current efforts to develop mechanical and computational models of voice production are also critically reviewed. Finally, issues and future challenges in developing a causal theory of voice production and perception are discussed.

  16. [Mechanism of neoglottic adjustment for voice variation in tracheoesophageal speech].

    PubMed

    Fujimoto, T; Kinishi, M; Mohri, M; Amatsu, M

    1994-06-01

    Over the past 17 years, we have been performing tracheoesophageal (TE) fistulization for voice restoration following total laryngectomy. The purpose of this technique is to divert the exhaled air through the TE fistula into the hypopharynx where the inferior constrictor muscle forms the retropharyngeal prominence on which the neoglottis is located. It is generally accepted that both pulmonary power and laryngeal adjustment control voice frequency and intensity change in laryngeal phonation. Regularity at various pitches and voice intensities was seen in TE phonation, despite laryngeal adjustment being lost. Regular voice production with various pitches and intensities requires a regulatory mechanism for both pulmonary power and the neoglottis. This study was designed to clarify the mechanism of neoglottic adjustment in TE phonation. Ten speakers with TE fistula were subjected to aerodynamic and electrophysiological investigations. Tracheal pressure, fundamental frequency, intensity, and airflow rate were measured for easy phonation, a high-pitched voice, and a loud voice. Resistance and efficiency of the neoglottis were calculated from the data obtained. Electromyograms of the inferior constrictor muscle and tracheal pressure were simultaneously recorded when the pitch or intensity of the voice increased. Six of the ten subjects examined were able to produce a high-pitched voice. Tracheal pressure increased in all six, the airflow rate in four, and neoglottal resistance in five, as compared with the data obtained during easy phonation. Nine of the ten subjects examined were able to produce a loud voice. In all nine, both tracheal pressure and the airflow rate increased as compared with the values measured during easy phonation. Neoglottal resistance had no definite pattern in relation to voice intensity changes. Electrophysiological study demonstrated that the activity of the inferior constrictor muscle increased as tracheal pressure increased so as to raise the

  17. Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems.

    PubMed

    Greene, Beth G; Logan, John S; Pisoni, David B

    1986-03-01

    We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.

  18. Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems

    PubMed Central

    GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.

    2012-01-01

    We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916

  19. Hearing history influences voice gender perceptual performance in cochlear implant users.

    PubMed

    Kovačić, Damir; Balaban, Evan

    2010-12-01

    The study was carried out to assess the role that five hearing history variables (chronological age, age at onset of deafness, age of first cochlear implant [CI] activation, duration of CI use, and duration of known deafness) play in the ability of CI users to identify speaker gender. Forty-one juvenile CI users participated in two voice gender identification tasks. In a fixed, single-interval task, subjects listened to a single speech item from one of 20 adult male or 20 adult female speakers and had to identify speaker gender. In an adaptive speech-based voice gender discrimination task with the fundamental frequency difference between the voices as the adaptive parameter, subjects listened to a pair of speech items presented in sequential order, one of which was always spoken by an adult female and the other by an adult male. Subjects had to identify the speech item spoken by the female voice. Correlation and regression analyses between perceptual scores in the two tasks and the hearing history variables were performed. Subjects fell into three performance groups: (1) those who could distinguish voice gender in both tasks, (2) those who could distinguish voice gender in the adaptive but not the fixed task, and (3) those who could not distinguish voice gender in either task. Gender identification performance for single voices in the fixed task was significantly and negatively related to the duration of deafness before cochlear implantation (shorter deafness yielded better performance), whereas performance in the adaptive task was weakly but significantly related to age at first activation of the CI device, with earlier activations yielding better scores. The existence of a group of subjects able to perform adaptive discrimination but unable to identify the gender of singly presented voices demonstrates the potential dissociability of the skills required for these two tasks, suggesting that duration of deafness and age of cochlear implantation could have

  20. Guidelines for Selecting Microphones for Human Voice Production Research

    ERIC Educational Resources Information Center

    Svec, Jan G.; Granqvist, Svante

    2010-01-01

    Purpose: This tutorial addresses fundamental characteristics of microphones (frequency response, frequency range, dynamic range, and directionality), which are important for accurate measurements of voice and speech. Method: Technical and voice literature was reviewed and analyzed. The following recommendations on desirable microphone…

  1. Speech processing using maximum likelihood continuity mapping

    DOEpatents

    Hogden, John E.

    2000-01-01

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  2. Speech processing using maximum likelihood continuity mapping

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogden, J.E.

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  3. Dramatic Effects of Speech Task on Motor and Linguistic Planning in Severely Dysfluent Parkinsonian Speech

    ERIC Educational Resources Information Center

    Van Lancker Sidtis, Diana; Cameron, Krista; Sidtis, John J.

    2012-01-01

    In motor speech disorders, dysarthric features impacting intelligibility, articulation, fluency and voice emerge more saliently in conversation than in repetition, reading or singing. A role of the basal ganglia in these task discrepancies has been identified. Further, more recent studies of naturalistic speech in basal ganglia dysfunction have…

  4. [The application of cybernetic modeling methods for the forensic medical personality identification based on the voice and sounding speech characteristics].

    PubMed

    Kaganov, A Sh; Kir'yanov, P A

    2015-01-01

    The objective of the present publication was to discuss the possibility of application of cybernetic modeling methods to overcome the apparent discrepancy between two kinds of the speech records, viz. initial ones (e.g. obtained in the course of special investigation activities) and the voice prints obtained from the persons subjected to the criminalistic examination. The paper is based on the literature sources and the materials of original criminalistics expertises performed by the authors.

  5. Temporal voice areas exist in autism spectrum disorder but are dysfunctional for voice identity recognition

    PubMed Central

    Borowiak, Kamila; von Kriegstein, Katharina

    2016-01-01

    The ability to recognise the identity of others is a key requirement for successful communication. Brain regions that respond selectively to voices exist in humans from early infancy on. Currently, it is unclear whether dysfunction of these voice-sensitive regions can explain voice identity recognition impairments. Here, we used two independent functional magnetic resonance imaging studies to investigate voice processing in a population that has been reported to have no voice-sensitive regions: autism spectrum disorder (ASD). Our results refute the earlier report that individuals with ASD have no responses in voice-sensitive regions: Passive listening to vocal, compared to non-vocal, sounds elicited typical responses in voice-sensitive regions in the high-functioning ASD group and controls. In contrast, the ASD group had a dysfunction in voice-sensitive regions during voice identity but not speech recognition in the right posterior superior temporal sulcus/gyrus (STS/STG)—a region implicated in processing complex spectrotemporal voice features and unfamiliar voices. The right anterior STS/STG correlated with voice identity recognition performance in controls but not in the ASD group. The findings suggest that right STS/STG dysfunction is critical for explaining voice recognition impairments in high-functioning ASD and show that ASD is not characterised by a general lack of voice-sensitive responses. PMID:27369067

  6. Alternative Speech Communication System for Persons with Severe Speech Disorders

    NASA Astrophysics Data System (ADS)

    Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

    2009-12-01

    Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.

  7. Female voice communications in high levels of aircraft cockpit noises--Part I: spectra, levels, and microphones.

    PubMed

    Nixon, C W; Morris, L J; McCavitt, A R; McKinley, R L; Anderson, T R; McDaniel, M P; Yeager, D G

    1998-07-01

    Female produced speech, although more intelligible than male speech in some noise spectra, may be more vulnerable to degradation by high levels of some military aircraft cockpit noises. The acoustic features of female speech are higher in frequency, lower in power, and appear more susceptible than male speech to masking by some of these military noises. Current military aircraft voice communication systems were optimized for the male voice and may not adequately accommodate the female voice in these high level noises. This applied study investigated the intelligibility of female and male speech produced in the noise spectra of four military aircraft cockpits at levels ranging from 95 dB to 115 dB. The experimental subjects used standard flight helmets and headsets, noise-canceling microphones, and military aircraft voice communications systems during the measurements. The intelligibility of female speech was lower than that of male speech for all experimental conditions; however, differences were small and insignificant except at the highest levels of the cockpit noises. Intelligibility for both genders varied with aircraft noise spectrum and level. Speech intelligibility of both genders was acceptable during normal cruise noises of all four aircraft, but improvements are required in the higher levels of noise created during aircraft maximum operating conditions. The intelligibility of female speech was unacceptable at the highest measured noise level of 115 dB and may constitute a problem for other military aviators. The intelligibility degradation due to the noise can be neutralized by use of an available, improved noise-canceling microphone, by the application of current active noise reduction technology to the personal communication equipment, and by the development of a voice communications system to accommodate the speech produced by both female and male aviators.

  8. Evidence-Based Clinical Voice Assessment: A Systematic Review

    ERIC Educational Resources Information Center

    Roy, Nelson; Barkmeier-Kraemer, Julie; Eadie, Tanya; Sivasankar, M. Preeti; Mehta, Daryush; Paul, Diane; Hillman, Robert

    2013-01-01

    Purpose: To determine what research evidence exists to support the use of voice measures in the clinical assessment of patients with voice disorders. Method: The American Speech-Language-Hearing Association (ASHA) National Center for Evidence-Based Practice in Communication Disorders staff searched 29 databases for peer-reviewed English-language…

  9. Assessing Chronic Stress, Coping Skills, and Mood Disorders through Speech Analysis: A Self-Assessment 'Voice App' for Laptops, Tablets, and Smartphones.

    PubMed

    Braun, Silke; Annovazzi, Chiara; Botella, Cristina; Bridler, René; Camussi, Elisabetta; Delfino, Juan P; Mohr, Christine; Moragrega, Ines; Papagno, Costanza; Pisoni, Alberto; Soler, Carla; Seifritz, Erich; Stassen, Hans H

    2016-01-01

    Computerized speech analysis (CSA) is a powerful method that allows one to assess stress-induced mood disturbances and affective disorders through repeated measurements of speaking behavior and voice sound characteristics. Over the past decades CSA has been successfully used in the clinical context to monitor the transition from 'affectively disturbed' to 'normal' among psychiatric patients under treatment. This project, by contrast, aimed to extend the CSA method in such a way that the transition from 'normal' to 'affected' can be detected among subjects of the general population through 10-20 self-assessments. Central to the project was a normative speech study of 5 major languages (English, French, German, Italian, and Spanish). Each language comprised 120 subjects stratified according to gender, age, and education with repeated assessments at 14-day intervals (total n = 697). In a first step, we developed a multivariate model to assess affective state and stress-induced bodily reactions through speaking behavior and voice sound characteristics. Secondly, we determined language-, gender-, and age-specific thresholds that draw a line between 'natural fluctuations' and 'significant changes'. Thirdly, we implemented the model along with the underlying methods and normative data in a self-assessment 'voice app' for laptops, tablets, and smartphones. Finally, a longitudinal self-assessment study of 36 subjects was carried out over 14 days to test the performance of the CSA method in home environments. The data showed that speaking behavior and voice sound characteristics can be quantified in a reproducible and language-independent way. Gender and age explained 15-35% of the observed variance, whereas the educational level had a relatively small effect in the range of 1-3%. The self-assessment 'voice app' was realized in modular form so that additional languages can simply be 'plugged in' once the respective normative data become available. Results of the longitudinal

  10. Standardization of pitch-range settings in voice acoustic analysis.

    PubMed

    Vogel, Adam P; Maruff, Paul; Snyder, Peter J; Mundt, James C

    2009-05-01

    Voice acoustic analysis is typically a labor-intensive, time-consuming process that requires the application of idiosyncratic parameters tailored to individual aspects of the speech signal. Such processes limit the efficiency and utility of voice analysis in clinical practice as well as in applied research and development. In the present study, we analyzed 1,120 voice files, using standard techniques (case-by-case hand analysis), taking roughly 10 work weeks of personnel time to complete. The results were compared with the analytic output of several automated analysis scripts that made use of preset pitch-range parameters. After pitch windows were selected to appropriately account for sex differences, the automated analysis scripts reduced processing time of the 1,120 speech samples to less than 2.5 h and produced results comparable to those obtained with hand analysis. However, caution should be exercised when applying the suggested preset values to pathological voice populations.

  11. Brainstem Correlates of Speech-in-Noise Perception in Children

    PubMed Central

    Anderson, Samira; Skoe, Erika; Chandrasekaran, Bharath; Zecker, Steven; Kraus, Nina

    2010-01-01

    Children often have difficulty understanding speech in challenging listening environments. In the absence of peripheral hearing loss, these speech perception difficulties may arise from dysfunction at more central levels in the auditory system, including subcortical structures. We examined brainstem encoding of pitch in a speech syllable in 38 school-age children. In children with poor speech-in-noise perception, we find impaired encoding of the fundamental frequency and the second harmonic, two important cues for pitch perception. Pitch, an important factor in speaker identification, aids the listener in tracking a specific voice from a background of voices. These results suggest that the robustness of subcortical neural encoding of pitch features in time-varying signals is an important factor in determining success with speech perception in noise. PMID:20708671

  12. Living with Hearing Loss

    MedlinePlus

    ... version of this page please turn Javascript on. Nora Woodruff and her family, including dad Bob, have ... hearing, balance, smell, taste, voice, speech, and language. Nora Woodruff, daughter of ABC newsman Bob Woodruff and ...

  13. Voice similarity in identical twins.

    PubMed

    Van Gysel, W D; Vercammen, J; Debruyne, F

    2001-01-01

    If people are asked to discriminate visually the two individuals of a monozygotic twin (MT), they mostly get into trouble. Does this problem also exist when listening to twin voices? Twenty female and 10 male MT voices were randomly assembled with one "strange" voice to get voice trios. The listeners (10 female students in Speech and Language Pathology) were asked to label the twins (voices 1-2, 1-3 or 2-3) in two conditions: two standard sentences read aloud and a 2.5-second midsection of a sustained /a/. The proportion correctly labelled twins was for female voices 82% and 63% and for male voices 74% and 52% for the sentences and the sustained /a/ respectively, both being significantly greater than chance (33%). The acoustic analysis revealed a high intra-twin correlation for the speaking fundamental frequency (SFF) of the sentences and the fundamental frequency (F0) of the sustained /a/. So the voice pitch could have been a useful characteristic in the perceptual identification of the twins. We conclude that there is a greater perceptual resemblance between the voices of identical twins than between voices without genetic relationship. The identification however is not perfect. The voice pitch possibly contributes to the correct twin identifications.

  14. Voice activity and participation profile: assessing the impact of voice disorders on daily activities.

    PubMed

    Ma, E P; Yiu, E M

    2001-06-01

    Traditional clinical voice evaluation focuses primarily on the severity of voice impairment, with little emphasis on the impact of voice disorders on the individual's quality of life. This study reports the development of a 28-item assessment tool that evaluates the perception of voice problem, activity limitation, and participation restriction using the International Classification of Impairments, Disabilities and Handicaps-2 Beta-1 concept (World Health Organization, 1997). The questionnaire was administered to 40 subjects with dysphonia and 40 control subjects with normal voices. Results showed that the dysphonic group reported significantly more severe voice problems, limitation in daily voice activities, and restricted participation in these activities than the control group. The study also showed that the perception of a voice problem by the dysphonic subjects correlated positively with the perception of limitation in voice activities and restricted participation. However, the self-perceived voice problem had little correlation with the degree of voice-quality impairment measured acoustically and perceptually by speech pathologists. The data also showed that the aggregate scores of activity limitation and participation restriction were positively correlated, and the extent of activity limitation and participation restriction was similar in all except the job area. These findings highlight the importance of identifying and quantifying the impact of dysphonia on the individual's quality of life in the clinical management of voice disorders.

  15. Feasibility of event-related potential (ERP) biomarker use to study effects of mother's voice exposure on speech sound differentiation of preterm infants.

    PubMed

    D Chorna, Olena; L Hamm, Ellyn; Shrivastava, Hemang; Maitre, Nathalie L

    2018-01-01

    Atypical maturation of auditory neural processing contributes to preterm-born infants' language delays. Event-related potential (ERP) measurement of speech-sound differentiation might fill a gap in treatment-response biomarkers to auditory interventions. We evaluated whether these markers could measure treatment effects in a quasi-randomized prospective study. Hospitalized preterm infants in passive or active, suck-contingent mother's voice exposure groups were not different at baseline. Post-intervention, the active group had greater increases in/du/-/gu/differentiation in left frontal and temporal regions. Infants with brain injury had lower baseline/ba/-/ga/and/du/-/gu/differentiation than those without. ERP provides valid discriminative, responsive, and predictive biomarkers of infant speech-sound differentiation.

  16. Five-year speech and language outcomes in children with cleft lip-palate.

    PubMed

    Prathanee, Benjamas; Pumnum, Tawitree; Seepuaham, Cholada; Jaiyong, Pechcharat

    2016-10-01

    To investigate 5-year speech and language outcomes in children with cleft lip/palate (CLP). Thirty-eight children aged 4-7 years and 8 months were recruited for this study. Speech abilities including articulation, resonance, voice, and intelligibility were assessed based on Thai Universal Parameters of Speech Outcomes. Language ability was assessed by the Language Screening Test. The findings revealed that children with clefts had speech and language delay, abnormal understandability, resonance abnormality, and voice disturbance; articulation defects that were 8.33 (1.75, 22.47), 50.00 (32.92, 67.08), 36.11 (20.82, 53.78), 30.56 (16.35, 48.11), and 94.44 (81.34, 99.32). Articulation errors were the most common speech and language defects in children with clefts, followed by abnormal understandability, resonance abnormality, and voice disturbance. These results should be of critical concern. Protocol reviewing and early intervention programs are needed for improved speech outcomes. Copyright © 2016 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.

  17. Speech-recognition interfaces for music information retrieval

    NASA Astrophysics Data System (ADS)

    Goto, Masataka

    2005-09-01

    This paper describes two hands-free music information retrieval (MIR) systems that enable a user to retrieve and play back a musical piece by saying its title or the artist's name. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. Our MIR-based jukebox systems employ two different speech-recognition interfaces for MIR, speech completion and speech spotter, which exploit intentionally controlled nonverbal speech information in original ways. The first is a music retrieval system with the speech-completion interface that is suitable for music stores and car-driving situations. When a user only remembers part of the name of a musical piece or an artist and utters only a remembered fragment, the system helps the user recall and enter the name by completing the fragment. The second is a background-music playback system with the speech-spotter interface that can enrich human-human conversation. When a user is talking to another person, the system allows the user to enter voice commands for music playback control by spotting a special voice-command utterance in face-to-face or telephone conversations. Experimental results from use of these systems have demonstrated the effectiveness of the speech-completion and speech-spotter interfaces. (Video clips: http://staff.aist.go.jp/m.goto/MIR/speech-if.html)

  18. Corollary discharge provides the sensory content of inner speech.

    PubMed

    Scott, Mark

    2013-09-01

    Inner speech is one of the most common, but least investigated, mental activities humans perform. It is an internal copy of one's external voice and so is similar to a well-established component of motor control: corollary discharge. Corollary discharge is a prediction of the sound of one's voice generated by the motor system. This prediction is normally used to filter self-caused sounds from perception, which segregates them from externally caused sounds and prevents the sensory confusion that would otherwise result. The similarity between inner speech and corollary discharge motivates the theory, tested here, that corollary discharge provides the sensory content of inner speech. The results reported here show that inner speech attenuates the impact of external sounds. This attenuation was measured using a context effect (an influence of contextual speech sounds on the perception of subsequent speech sounds), which weakens in the presence of speech imagery that matches the context sound. Results from a control experiment demonstrated this weakening in external speech as well. Such sensory attenuation is a hallmark of corollary discharge.

  19. McGurk Effect in Gender Identification: Vision Trumps Audition in Voice Judgments.

    PubMed

    Peynircioǧlu, Zehra F; Brent, William; Tatz, Joshua R; Wyatt, Jordan

    2017-01-01

    Demonstrations of non-speech McGurk effects are rare, mostly limited to emotion identification, and sometimes not considered true analogues. We presented videos of males and females singing a single syllable on the same pitch and asked participants to indicate the true range of the voice-soprano, alto, tenor, or bass. For one group of participants, the gender shown on the video matched the gender of the voice heard, and for the other group they were mismatched. Soprano or alto responses were interpreted as "female voice" decisions and tenor or bass responses as "male voice" decisions. Identification of the voice gender was 100% correct in the preceding audio-only condition. However, whereas performance was also 100% correct in the matched video/audio condition, it was only 31% correct in the mismatched video/audio condition. Thus, the visual gender information overrode the voice gender identification, showing a robust non-speech McGurk effect.

  20. Common cues to emotion in the dynamic facial expressions of speech and song.

    PubMed

    Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline

    2015-01-01

    Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.

  1. Automatic initial and final segmentation in cleft palate speech of Mandarin speakers

    PubMed Central

    Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang

    2017-01-01

    The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with “quasi-unvoiced” or with “quasi-voiced” initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91

  2. Relationship between perceived politeness and spectral characteristics of voice

    NASA Astrophysics Data System (ADS)

    Ito, Mika

    2005-04-01

    This study investigates the role of voice quality in perceiving politeness under conditions of varying relative social status among Japanese male speakers. The work focuses on four important methodological issues: experimental control of sociolinguistic aspects, eliciting natural spontaneous speech, obtaining recording quality suitable for voice quality analysis, and assessment of glottal characteristics through the use of non-invasive direct measurements of the speech spectrum. To obtain natural, unscripted utterances, the speech data were collected with a Map Task. This methodology allowed us to study the effect of manipulating relative social status among participants in the same community. We then computed the relative amplitudes of harmonics and formant peaks in spectra obtained from the Map Task recordings. Finally, an experiment was conducted to observe the alignment between acoustic measures and the perceived politeness of the voice samples. The results suggest that listeners' perceptions of politeness are determined by spectral characteristics of speakers, in particular, spectral tilts obtained by computing the difference in amplitude between the first harmonic and the third formant.

  3. Perceptual, auditory and acoustic vocal analysis of speech and singing in choir conductors.

    PubMed

    Rehder, Maria Inês Beltrati Cornacchioni; Behlau, Mara

    2008-01-01

    the voice of choir conductors. to evaluate the vocal quality of choir conductors based on the production of a sustained vowel during singing and when speaking in order to observe auditory and acoustic differences. participants of this study were 100 choir conductors, with an equal distribution between genders. Participants were asked to produce the sustained vowel "é" using a singing and speaking voice. Speech samples were analyzed based on auditory-perceptive and acoustic parameters. The auditory-perceptive analysis was carried out by two speech-language pathologist, specialists in this field of knowledge. The acoustic analysis was carried out with the support of the computer software Doctor Speech (Tiger Electronics, SRD, USA, version 4.0), using the Real Analysis module. the auditory-perceptive analysis of the vocal quality indicated that most conductors have adapted voices, presenting more alterations in their speaking voice. The acoustic analysis indicated different values between genders and between the different production modalities. The fundamental frequency was higher in the singing voice, as well as the values for the first formant; the second formant presented lower values in the singing voice, with statistically significant results only for women. the voice of choir conductors is adapted, presenting fewer deviations in the singing voice when compared to the speaking voice. Productions differ based the voice modality, singing or speaking.

  4. A Resource Manual for Speech and Hearing Programs in Oklahoma.

    ERIC Educational Resources Information Center

    Oklahoma State Dept. of Education, Oklahoma City.

    Administrative aspects of the Oklahoma speech and hearing program are described, including state requirements, school administrator role, and organizational and operational procedures. Information on speech and language development and remediation covers language, articulation, stuttering, voice disorders, cleft palate, speech improvement,…

  5. Voice interactive electronic warning systems (VIEWS) - An applied approach to voice technology in the helicopter cockpit

    NASA Technical Reports Server (NTRS)

    Voorhees, J. W.; Bucher, N. M.

    1983-01-01

    The cockpit has been one of the most rapidly changing areas of new aircraft design over the past thirty years. In connection with these developments, a pilot can now be considered a decision maker/system manager as well as a vehicle controller. There is, however, a trend towards an information overload in the cockpit, and information processing problems begin to occur for the rotorcraft pilot. One approach to overcome the arising difficulties is based on the utilization of voice technology to improve the information transfer rate in the cockpit with respect to both input and output. Attention is given to the background of speech technology, the application of speech technology within the cockpit, voice interactive electronic warning system (VIEWS) simulation, and methodology. Information subsystems are considered along with a dynamic simulation study, and data collection.

  6. Speech waveform perturbation analysis: a perceptual-acoustical comparison of seven measures.

    PubMed

    Askenfelt, A G; Hammarberg, B

    1986-03-01

    The performance of seven acoustic measures of cycle-to-cycle variations (perturbations) in the speech waveform was compared. All measures were calculated automatically and applied on running speech. Three of the measures refer to the frequency of occurrence and severity of waveform perturbations in special selected parts of the speech, identified by means of the rate of change in the fundamental frequency. Three other measures refer to statistical properties of the distribution of the relative frequency differences between adjacent pitch periods. One perturbation measure refers to the percentage of consecutive pitch period differences with alternating signs. The acoustic measures were tested on tape recorded speech samples from 41 voice patients, before and after successful therapy. Scattergrams of acoustic waveform perturbation data versus an average of perceived deviant voice qualities, as rated by voice clinicians, are presented. The perturbation measures were compared with regard to the acoustic-perceptual correlation and their ability to discriminate between normal and pathological voice status. The standard deviation of the distribution of the relative frequency differences was suggested as the most useful acoustic measure of waveform perturbations for clinical applications.

  7. Design of an efficient music-speech discriminator.

    PubMed

    Tardón, Lorenzo J; Sammartino, Simone; Barbancho, Isabel

    2010-01-01

    In this paper, the problem of the design of a simple and efficient music-speech discriminator for large audio data sets in which advanced music playing techniques are taught and voice and music are intrinsically interleaved is addressed. In the process, a number of features used in speech-music discrimination are defined and evaluated over the available data set. Specifically, the data set contains pieces of classical music played with different and unspecified instruments (or even lyrics) and the voice of a teacher (a top music performer) or even the overlapped voice of the translator and other persons. After an initial test of the performance of the features implemented, a selection process is started, which takes into account the type of classifier selected beforehand, to achieve good discrimination performance and computational efficiency, as shown in the experiments. The discrimination application has been defined and tested on a large data set supplied by Fundacion Albeniz, containing a large variety of classical music pieces played with different instrument, which include comments and speeches of famous performers.

  8. Voice Acoustical Measurement of the Severity of Major Depression

    ERIC Educational Resources Information Center

    Cannizzaro, Michael; Harel, Brian; Reilly, Nicole; Chappell, Phillip; Snyder, Peter J.

    2004-01-01

    A number of empirical studies have documented the relationship between quantifiable and objective acoustical measures of voice and speech, and clinical subjective ratings of severity of Major Depression. To further explore this relationship, speech samples were extracted from videotape recordings of structured interviews made during the…

  9. Voice Technologies in Libraries: A Look into the Future.

    ERIC Educational Resources Information Center

    Lange, Holley R., Ed.; And Others

    1991-01-01

    Discussion of synthesized speech and voice recognition focuses on a forum that addressed the potential for speech technologies in libraries. Topics discussed by three contributors include possible library applications in technical processing, book receipt, circulation control, and database access; use by disabled and illiterate users; and problems…

  10. Electrocorticographic representations of segmental features in continuous speech

    PubMed Central

    Lotte, Fabien; Brumberg, Jonathan S.; Brunner, Peter; Gunduz, Aysegul; Ritaccio, Anthony L.; Guan, Cuntai; Schalk, Gerwin

    2015-01-01

    Acoustic speech output results from coordinated articulation of dozens of muscles, bones and cartilages of the vocal mechanism. While we commonly take the fluency and speed of our speech productions for granted, the neural mechanisms facilitating the requisite muscular control are not completely understood. Previous neuroimaging and electrophysiology studies of speech sensorimotor control has typically concentrated on speech sounds (i.e., phonemes, syllables and words) in isolation; sentence-length investigations have largely been used to inform coincident linguistic processing. In this study, we examined the neural representations of segmental features (place and manner of articulation, and voicing status) in the context of fluent, continuous speech production. We used recordings from the cortical surface [electrocorticography (ECoG)] to simultaneously evaluate the spatial topography and temporal dynamics of the neural correlates of speech articulation that may mediate the generation of hypothesized gestural or articulatory scores. We found that the representation of place of articulation involved broad networks of brain regions during all phases of speech production: preparation, execution and monitoring. In contrast, manner of articulation and voicing status were dominated by auditory cortical responses after speech had been initiated. These results provide a new insight into the articulatory and auditory processes underlying speech production in terms of their motor requirements and acoustic correlates. PMID:25759647

  11. [Nature of speech disorders in Parkinson disease].

    PubMed

    Pawlukowska, W; Honczarenko, K; Gołąb-Janowska, M

    2013-01-01

    The aim of the study was to discuss physiology and pathology of speech and review of the literature on speech disorders in Parkinson disease. Additionally, the most effective methods to diagnose the speech disorders in Parkinson disease were also stressed. Afterward, articulatory, respiratory, acoustic and pragmatic factors contributing to the exacerbation of the speech disorders were discussed. Furthermore, the study dealt with the most important types of speech treatment techniques available (pharmacological and behavioral) and a significance of Lee Silverman Voice Treatment was highlighted.

  12. Speech rate reduction and "nasality" in normal speakers.

    PubMed

    Brancewicz, T M; Reich, A R

    1989-12-01

    This study explored the effects of reduced speech rate on nasal/voice accelerometric measures and nasality ratings. Nasal/voice accelerometric measures were obtained from normal adults for various speech stimuli and speaking rates. Stimuli included three sentences (one obstruent-loaded, one semivowel-loaded, and one containing a single nasal), and /pv/ syllable trains.. Speakers read the stimuli at their normal rate, half their normal rate, and as slowly as possible. In addition, a computer program paced each speaker at rates of 1, 2, and 3 syllables per second. The nasal/voice accelerometric values revealed significant stimulus effects but no rate effects. The nasality ratings of experienced listeners, evaluated as a function of stimulus and speaking rate, were compared to the accelerometric measures. The nasality scale values demonstrated small, but statistically significant, stimulus and rate effects. However, the nasality percepts were poorly correlated with the nasal/voice accelerometric measures.

  13. Separation of Singing Voice from Music Accompaniment for Monaural Recordings

    DTIC Science & Technology

    2005-09-01

    Directory: pub/tech-report/2005 File in pdf format: TR61.pdf Separation of Singing Voice from Music Accompaniment for Monaural Recordings Yipeng Li...Abstract Separating singing voice from music accompaniment is very useful in many applications, such as lyrics recognition and alignment, singer...identification, and music information retrieval. Although speech separation has been extensively studied for decades, singing voice separation has been little

  14. How do teachers with self-reported voice problems differ from their peers with self-reported voice health?

    PubMed

    Lyberg Åhlander, Viveka; Rydell, Roland; Löfqvist, Anders

    2012-07-01

    This randomized case-control study compares teachers with self-reported voice problems to age-, gender-, and school-matched colleagues with self-reported voice health. The self-assessed voice function is related to factors known to influence the voice: laryngeal findings, voice quality, personality, psychosocial and coping aspects, searching for causative factors of voice problems in teachers. Subjects and controls, recruited from a teacher group in an earlier questionnaire study, underwent examinations of the larynx by high-speed imaging and kymograms; voice recordings; voice range profile; audiometry; self-assessment of voice handicap and voice function; teaching and environmental aspects; personality; coping; burnout, and work-related issues. The laryngeal and voice recordings were assessed by experienced phoniatricians and speech pathologists. The subjects with self-assessed voice problems differed from their peers with self-assessed voice health by significantly longer recovery time from voice problems and scored higher on all subscales of the Voice Handicap Index-Throat. The results show that the cause of voice dysfunction in this group of teachers with self-reported voice problems is not found in the vocal apparatus or within the individual. The individual's perception of a voice problem seems to be based on a combination of the number of symptoms and of how often the symptoms occur, along with the recovery time. The results also underline the importance of using self-assessed reports of voice dysfunction. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  15. Speech Motor Development during Acquisition of the Voicing Contrast

    ERIC Educational Resources Information Center

    Grigos, Maria I.; Saxman, John H.; Gordon, Andrew M.

    2005-01-01

    Lip and jaw movements were studied longitudinally in 19-month-old children as they acquired the voicing contrast for /p/ and /b/. A movement tracking system obtained lip and jaw kinematics as participants produced the target utterances /papa/ and /baba/. Laryngeal adjustments were also tracked through acoustically recorded voice onset time (VOT)…

  16. NWR (National Weather Service) voice synthesis project, phase 1

    NASA Astrophysics Data System (ADS)

    Sampson, G. W.

    1986-01-01

    The purpose of the NOAA Weather Radio (NWR) Voice Synthesis Project is to provide a demonstration of the current voice synthesis technology. Phase 1 of this project is presented, providing a complete automation of an hourly surface aviation observation for broadcast over NWR. In examining the products currently available on the market, the decision was made that synthetic voice technology does not have the high quality speech required for broadcast over the NWR. Therefore the system presented uses the phrase concatenation type of technology for a very high quality, versatile, voice synthesis system.

  17. Postlingual adult performance in noise with HiRes 120 and ClearVoice Low, Medium, and High.

    PubMed

    Holden, Laura K; Brenner, Christine; Reeder, Ruth M; Firszt, Jill B

    2013-11-01

    The study's objectives were to evaluate speech recognition in multiple listening conditions using several noise types with HiRes 120 and ClearVoice (Low, Medium, High) and to determine which ClearVoice program was most beneficial for everyday use. Fifteen postlingual adults attended four sessions; speech recognition was assessed at sessions 1 and 3 with HiRes 120 and at sessions 2 and 4 with all ClearVoice programs. Test measures included sentences presented in restaurant noise (R-SPACE), in speech-spectrum noise, in four- and eight-talker babble, and connected discourse presented in 12-talker babble. Participants completed a questionnaire comparing ClearVoice programs. Significant group differences in performance between HiRes 120 and ClearVoice were present only in the R-SPACE; performance was better with ClearVoice High than HiRes 120. Among ClearVoice programs, no significant group differences were present for any measure. Individual results revealed most participants performed better in the R-SPACE with ClearVoice than HiRes 120. For other measures, significant individual differences between HiRes 120 and ClearVoice were not prevalent. Individual results among ClearVoice programs differed and overall preferences varied. Questionnaire data indicated increased understanding with High and Medium in certain environments. R-SPACE and questionnaire results indicated an advantage for ClearVoice High and Medium. Individual test and preference data showed mixed results between ClearVoice programs making global recommendations difficult; however, results suggest providing ClearVoice High and Medium and HiRes 120 as processor options for adults willing to change settings. For adults unwilling or unable to change settings, ClearVoice Medium is a practical choice for daily listening.

  18. Behavioral treatments for speech in Parkinson's disease: meta-analyses and review of the literature.

    PubMed

    Atkinson-Clement, Cyril; Sadat, Jasmin; Pinto, Serge

    2015-01-01

    Parkinson's disease (PD) results from neurodegenerative processes leading to alteration of motor functions. Most motor symptoms respond well to pharmacological and neurosurgical treatments, except some axial symptoms such as speech impairment, so-called dysarthria. However, speech therapy is rarely proposed to PD patients. This review aims at evaluating previous research on the effects of speech behavioral therapies in patients with PD. We also performed two meta-analyses focusing on speech loudness and voice pitch. We showed that intensive therapies in PD are the most effective for hypophonia and can lead to some improvement of voice pitch. Although speech therapy is effective in handling PD dysarthria, behavioral speech rehabilitation in PD still needs further validation.

  19. Taste transductions in taste receptor cells: basic tastes and moreover.

    PubMed

    Iwata, Shusuke; Yoshida, Ryusuke; Ninomiya, Yuzo

    2014-01-01

    In the oral cavity, taste receptor cells dedicate to detecting chemical compounds in foodstuffs and transmitting their signals to gustatory nerve fibers. Heretofore, five taste qualities (sweet, umami, bitter, salty and sour) are generally accepted as basic tastes. Each of these may have a specific role in the detection of nutritious and poisonous substances; sweet for carbohydrate sources of calories, umami for protein and amino acid contents, bitter for harmful compounds, salty for minerals and sour for ripeness of fruits and spoiled foods. Recent studies have revealed molecular mechanisms for reception and transduction of these five basic tastes. Sweet, umami and bitter tastes are mediated by G-protein coupled receptors (GPCRs) and second-messenger signaling cascades. Salty and sour tastes are mediated by channel-type receptors. In addition to five basic tastes, taste receptor cells may have the ability to detect fat taste, which is elicited by fatty acids, and calcium taste, which is elicited by calcium. Taste compounds eliciting either fat taste or calcium taste may be detected by specific GPCRs expressed in taste receptor cells. This review will focus on transduction mechanisms and cellular characteristics responsible for each of basic tastes, fat taste and calcium taste.

  20. STS-41 Voice Command System Flight Experiment Report

    NASA Technical Reports Server (NTRS)

    Salazar, George A.

    1981-01-01

    This report presents the results of the Voice Command System (VCS) flight experiment on the five-day STS-41 mission. Two mission specialists,Bill Shepherd and Bruce Melnick, used the speaker-dependent system to evaluate the operational effectiveness of using voice to control a spacecraft system. In addition, data was gathered to analyze the effects of microgravity on speech recognition performance.

  1. Acoustic voice analysis of prelingually deaf adults before and after cochlear implantation.

    PubMed

    Evans, Maegan K; Deliyski, Dimitar D

    2007-11-01

    It is widely accepted that many severe to profoundly deaf adults have benefited from cochlear implants (CIs). However, limited research has been conducted to investigate changes in voice and speech of prelingually deaf adults who receive CIs, a population well known for presenting with a variety of voice and speech abnormalities. The purpose of this study was to use acoustic analysis to explore changes in voice and speech for three prelingually deaf males pre- and postimplantation over 6 months. The following measurements, some measured in varying contexts, were obtained: fundamental frequency (F0), jitter, shimmer, noise-to-harmonic ratio, voice turbulence index, soft phonation index, amplitude- and F0-variation, F0-range, speech rate, nasalance, and vowel production. Characteristics of vowel production were measured by determining the first formant (F1) and second formant (F2) of vowels in various contexts, magnitude of F2-variation, and rate of F2-variation. Perceptual measurements of pitch, pitch variability, loudness variability, speech rate, and intonation were obtained for comparison. Results are reported using descriptive statistics. The results showed patterns of change for some of the parameters while there was considerable variation across the subjects. All participants demonstrated a decrease in F0 in at least one context and demonstrated a change in nasalance toward the norm as compared to their normal hearing control. The two participants who were oral-language communicators were judged to produce vowels with an average of 97.2% accuracy and the sign-language user demonstrated low percent accuracy for vowel production.

  2. Arguments against the Aggressive Pursuit of Voice Therapy for Children.

    ERIC Educational Resources Information Center

    Sander, Eric K.

    1989-01-01

    A less aggressive treatment strategy is proposed in the area of children's voice disorders. Speech clinicians are urged not to be over-zealous in imposition of their own voice standards. The potential threat that vocal pathologies hold for children's larynges is felt to be largely over-rated. (Author/JDD)

  3. Military and Government Applications of Human-Machine Communication by Voice

    NASA Astrophysics Data System (ADS)

    Weinstein, Clifford J.

    1995-10-01

    This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier's computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs.

  4. Unilateral Vocal Fold Paralysis: A Systematic Review of Speech-Language Pathology Management.

    PubMed

    Walton, Chloe; Conway, Erin; Blackshaw, Helen; Carding, Paul

    2017-07-01

    Dysphonia due to unilateral vocal fold paralysis (UVFP) can be characterized by hoarseness and weakness, resulting in a significant impact on patients' activity and participation. Voice therapy provided by a speech-language pathologist is designed to maximize vocal function and improve quality of life. The purpose of this paper is to systematically review literature surrounding the effectiveness of speech-language pathology intervention for the management of UVFP in adults. This is a systematic review. Electronic databases were searched using a range of key terms including dysphonia, vocal fold paralysis, and speech-language pathology. Eligible articles were extracted and reviewed by the authors for risk of bias, methodology, treatment efficacy, and clinical outcomes. Of the 3311 articles identified, 12 met the inclusion criteria: seven case series and five comparative studies. All 12 studies subjectively reported positive effects following the implementation of voice therapy for UVFP; however, the heterogeneity of participant characteristics, voice therapy, and voice outcome resulted in a low level of evidence. There is presently a lack of methodological rigor and clinical efficacy in the speech-language pathology management of dysphonia arising from UVFP in adults. Reasons for this reduced efficacy can be attributed to the following: (1) no standardized speech-language pathology intervention; (2) no consistency of assessment battery; (3) the variable etiology and clinical presentation of UVFP; and (4) inconsistent timing, frequency, and intensity of treatment. Further research is required to develop the evidence for the management of UVFP incorporating controlled treatment protocols and more rigorous clinical methodology. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  5. Speech Perception and Production by Sequential Bilingual Children: A Longitudinal Study of Voice Onset Time Acquisition

    PubMed Central

    McCarthy, Kathleen M; Mahon, Merle; Rosen, Stuart; Evans, Bronwen G

    2014-01-01

    The majority of bilingual speech research has focused on simultaneous bilinguals. Yet, in immigrant communities, children are often initially exposed to their family language (L1), before becoming gradually immersed in the host country's language (L2). This is typically referred to as sequential bilingualism. Using a longitudinal design, this study explored the perception and production of the English voicing contrast in 55 children (40 Sylheti-English sequential bilinguals and 15 English monolinguals). Children were tested twice: when they were in nursery (52-month-olds) and 1 year later. Sequential bilinguals' perception and production of English plosives were initially driven by their experience with their L1, but after starting school, changed to match that of their monolingual peers. PMID:25123987

  6. Automatic intelligibility classification of sentence-level pathological speech

    PubMed Central

    Kim, Jangwon; Kumar, Naveen; Tsiartas, Andreas; Li, Ming; Narayanan, Shrikanth S.

    2014-01-01

    Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. Although automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist experts in diagnosis and treatment design, the many sources and types of variability often make it a very challenging computational processing problem. In this work we propose novel sentence-level features to capture abnormal variation in the prosodic, voice quality and pronunciation aspects in pathological speech. In addition, we propose a post-classification posterior smoothing scheme which refines the posterior of a test sample based on the posteriors of other test samples. Finally, we perform feature-level fusions and subsystem decision fusion for arriving at a final intelligibility decision. The performances are tested on two pathological speech datasets, the NKI CCRT Speech Corpus (advanced head and neck cancer) and the TORGO database (cerebral palsy or amyotrophic lateral sclerosis), by evaluating classification accuracy without overlapping subjects’ data among training and test partitions. Results show that the feature sets of each of the voice quality subsystem, prosodic subsystem, and pronunciation subsystem, offer significant discriminating power for binary intelligibility classification. We observe that the proposed posterior smoothing in the acoustic space can further reduce classification errors. The smoothed posterior score fusion of subsystems shows the best classification performance (73.5% for unweighted, and 72.8% for weighted, average recalls of the binary classes). PMID:25414544

  7. Speech-Message Extraction from Interference Introduced by External Distributed Sources

    NASA Astrophysics Data System (ADS)

    Kanakov, V. A.; Mironov, N. A.

    2017-08-01

    The problem of this study involves the extraction of a speech signal originating from a certain spatial point and calculation of the intelligibility of the extracted voice message. It is solved by the method of decreasing the influence of interference from the speech-message sources on the extracted signal. This method is based on introducing the time delays, which depend on the spatial coordinates, to the recording channels. Audio records of the voices of eight different people were used as test objects during the studies. It is proved that an increase in the number of microphones improves intelligibility of the speech message which is extracted from interference.

  8. VOT and the perception of voicing

    NASA Astrophysics Data System (ADS)

    Remez, Robert E.

    2004-05-01

    In explaining the ability to distinguish phonemes, linguists have described the dimension of voicing. Acoustic analyses have identified many correlates of the voicing contrast in initial, medial, and final consonants within syllables, and these in turn have motivated studies of the perceptual resolution of voicing. The framing conceptualization articulated by Lisker and Abramson 40 years ago in physiological, phonetic, and perceptual studies has been widely influential, and research on voicing now adopts their perspective without reservation. Their original survey included languages with two voicing categories (Dutch, Puerto Rican Spanish, Hungarian, Tamil, Cantonese, English), three voicing categories (Eastern Armenian, Thai, Korean), and four voicing categories (Hindi, Marathi). Perceptual studies inspired by this work have also ranged widely, including tests with different languages and with listeners of several species. The profound value of the analyses of Lisker and Abramson is evident in the empirical traction provided by the concept of VOT in research on the every important perceptual question about speech and language in our era. Some of these classic perceptual investigations will be reviewed. [Research supported by NIH (DC00308).

  9. Vocal Age Disguise: The Role of Fundamental Frequency and Speech Rate and Its Perceived Effects.

    PubMed

    Skoog Waller, Sara; Eriksson, Mårten

    2016-01-01

    The relationship between vocal characteristics and perceived age is of interest in various contexts, as is the possibility to affect age perception through vocal manipulation. A few examples of such situations are when age is staged by actors, when ear witnesses make age assessments based on vocal cues only or when offenders (e.g., online groomers) disguise their voice to appear younger or older. This paper investigates how speakers spontaneously manipulate two age related vocal characteristics ( f 0 and speech rate) in attempt to sound younger versus older than their true age, and if the manipulations correspond to actual age related changes in f 0 and speech rate (Study 1). Further aims of the paper is to determine how successful vocal age disguise is by asking listeners to estimate the age of generated speech samples (Study 2) and to examine whether or not listeners use f 0 and speech rate as cues to perceived age. In Study 1, participants from three age groups (20-25, 40-45, and 60-65 years) agreed to read a short text under three voice conditions. There were 12 speakers in each age group (six women and six men). They used their natural voice in one condition, attempted to sound 20 years younger in another and 20 years older in a third condition. In Study 2, 60 participants (listeners) listened to speech samples from the three voice conditions in Study 1 and estimated the speakers' age. Each listener was exposed to all three voice conditions. The results from Study 1 indicated that the speakers increased fundamental frequency ( f 0 ) and speech rate when attempting to sound younger and decreased f 0 and speech rate when attempting to sound older. Study 2 showed that the voice manipulations had an effect in the sought-after direction, although the achieved mean effect was only 3 years, which is far less than the intended effect of 20 years. Moreover, listeners used speech rate, but not f 0 , as a cue to speaker age. It was concluded that age disguise by voice

  10. Vocal Age Disguise: The Role of Fundamental Frequency and Speech Rate and Its Perceived Effects

    PubMed Central

    Skoog Waller, Sara; Eriksson, Mårten

    2016-01-01

    The relationship between vocal characteristics and perceived age is of interest in various contexts, as is the possibility to affect age perception through vocal manipulation. A few examples of such situations are when age is staged by actors, when ear witnesses make age assessments based on vocal cues only or when offenders (e.g., online groomers) disguise their voice to appear younger or older. This paper investigates how speakers spontaneously manipulate two age related vocal characteristics (f0 and speech rate) in attempt to sound younger versus older than their true age, and if the manipulations correspond to actual age related changes in f0 and speech rate (Study 1). Further aims of the paper is to determine how successful vocal age disguise is by asking listeners to estimate the age of generated speech samples (Study 2) and to examine whether or not listeners use f0 and speech rate as cues to perceived age. In Study 1, participants from three age groups (20–25, 40–45, and 60–65 years) agreed to read a short text under three voice conditions. There were 12 speakers in each age group (six women and six men). They used their natural voice in one condition, attempted to sound 20 years younger in another and 20 years older in a third condition. In Study 2, 60 participants (listeners) listened to speech samples from the three voice conditions in Study 1 and estimated the speakers’ age. Each listener was exposed to all three voice conditions. The results from Study 1 indicated that the speakers increased fundamental frequency (f0) and speech rate when attempting to sound younger and decreased f0 and speech rate when attempting to sound older. Study 2 showed that the voice manipulations had an effect in the sought-after direction, although the achieved mean effect was only 3 years, which is far less than the intended effect of 20 years. Moreover, listeners used speech rate, but not f0, as a cue to speaker age. It was concluded that age disguise by voice can

  11. Voice stress analysis

    NASA Technical Reports Server (NTRS)

    Brenner, Malcolm; Shipp, Thomas

    1988-01-01

    In a study of the validity of eight candidate voice measures (fundamental frequency, amplitude, speech rate, frequency jitter, amplitude shimmer, Psychological Stress Evaluator scores, energy distribution, and the derived measure of the above measures) for determining psychological stress, 17 males age 21 to 35 were subjected to a tracking task on a microcomputer CRT while parameters of vocal production as well as heart rate were measured. Findings confirm those of earlier studies that increases in fundamental frequency, amplitude, and speech rate are found in speakers involved in extreme levels of stress. In addition, it was found that the same changes appear to occur in a regular fashion within a more subtle level of stress that may be characteristic, for example, of routine flying situations. None of the individual speech measures performed as robustly as did heart rate.

  12. Speech-based Class Attendance

    NASA Astrophysics Data System (ADS)

    Faizel Amri, Umar; Nur Wahidah Nik Hashim, Nik; Hazrin Hany Mohamad Hanif, Noor

    2017-11-01

    In the department of engineering, students are required to fulfil at least 80 percent of class attendance. Conventional method requires student to sign his/her initial on the attendance sheet. However, this method is prone to cheating by having another student signing for their fellow classmate that is absent. We develop our hypothesis according to a verse in the Holy Qur’an (95:4), “We have created men in the best of mould”. Based on the verse, we believe each psychological characteristic of human being is unique and thus, their speech characteristic should be unique. In this paper we present the development of speech biometric-based attendance system. The system requires user’s voice to be installed in the system as trained data and it is saved in the system for registration of the user. The following voice of the user will be the test data in order to verify with the trained data stored in the system. The system uses PSD (Power Spectral Density) and Transition Parameter as the method for feature extraction of the voices. Euclidean and Mahalanobis distances are used in order to verified the user’s voice. For this research, ten subjects of five females and five males were chosen to be tested for the performance of the system. The system performance in term of recognition rate is found to be 60% correct identification of individuals.

  13. Measures of voiced frication for automatic classification

    NASA Astrophysics Data System (ADS)

    Jackson, Philip J. B.; Jesus, Luis M. T.; Shadle, Christine H.; Pincas, Jonathan

    2004-05-01

    As an approach to understanding the characteristics of the acoustic sources in voiced fricatives, it seems apt to draw on knowledge of vowels and voiceless fricatives, which have been relatively well studied. However, the presence of both phonation and frication in these mixed-source sounds offers the possibility of mutual interaction effects, with variations across place of articulation. This paper examines the acoustic and articulatory consequences of these interactions and explores automatic techniques for finding parametric and statistical descriptions of these phenomena. A reliable and consistent set of such acoustic cues could be used for phonetic classification or speech recognition. Following work on devoicing of European Portuguese voiced fricatives [Jesus and Shadle, in Mamede et al. (eds.) (Springer-Verlag, Berlin, 2003), pp. 1-8]. and the modulating effect of voicing on frication [Jackson and Shadle, J. Acoust. Soc. Am. 108, 1421-1434 (2000)], the present study focuses on three types of information: (i) sequences and durations of acoustic events in VC transitions, (ii) temporal, spectral and modulation measures from the periodic and aperiodic components of the acoustic signal, and (iii) voicing activity derived from simultaneous EGG data. Analysis of interactions observed in British/American English and European Portuguese speech corpora will be compared, and the principal findings discussed.

  14. Evaluation of synthesized voice approach callouts /SYNCALL/

    NASA Technical Reports Server (NTRS)

    Simpson, C. A.

    1981-01-01

    The two basic approaches to the generation of 'synthesized' speech include a utilization of analog recorded human speech and a construction of speech entirely from algorithms applied to constants describing speech sounds. Given the availability of synthesized speech displays for man-machine systems, research is needed to study suggested applications for speech and design principles for speech displays. The present investigation is concerned with a study for which new performance measures were developed. A number of air carrier approach and landing accidents during low or impaired visibility have been associated with the absence of approach callouts. The study had the purpose to compare a pilot-not-flying (PNF) approach callout system to a system composed of PNF callouts augmented by an automatic synthesized voice callout system (SYNCALL). Pilots were found to favor the use of a SYNCALL system containing certain modifications.

  15. Exploring the anatomical encoding of voice with a mathematical model of the vocal system.

    PubMed

    Assaneo, M Florencia; Sitt, Jacobo; Varoquaux, Gael; Sigman, Mariano; Cohen, Laurent; Trevisan, Marcos A

    2016-11-01

    The faculty of language depends on the interplay between the production and perception of speech sounds. A relevant open question is whether the dimensions that organize voice perception in the brain are acoustical or depend on properties of the vocal system that produced it. One of the main empirical difficulties in answering this question is to generate sounds that vary along a continuum according to the anatomical properties the vocal apparatus that produced them. Here we use a mathematical model that offers the unique possibility of synthesizing vocal sounds by controlling a small set of anatomically based parameters. In a first stage the quality of the synthetic voice was evaluated. Using specific time traces for sub-glottal pressure and tension of the vocal folds, the synthetic voices generated perceptual responses, which are indistinguishable from those of real speech. The synthesizer was then used to investigate how the auditory cortex responds to the perception of voice depending on the anatomy of the vocal apparatus. Our fMRI results show that sounds are perceived as human vocalizations when produced by a vocal system that follows a simple relationship between the size of the vocal folds and the vocal tract. We found that these anatomical parameters encode the perceptual vocal identity (male, female, child) and show that the brain areas that respond to human speech also encode vocal identity. On the basis of these results, we propose that this low-dimensional model of the vocal system is capable of generating realistic voices and represents a novel tool to explore the voice perception with a precise control of the anatomical variables that generate speech. Furthermore, the model provides an explanation of how auditory cortices encode voices in terms of the anatomical parameters of the vocal system. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. Speech Spectrum's Correlation with Speakers' Eysenck Personality Traits

    PubMed Central

    Hu, Chao; Wang, Qiandong; Short, Lindsey A.; Fu, Genyue

    2012-01-01

    The current study explored the correlation between speakers' Eysenck personality traits and speech spectrum parameters. Forty-six subjects completed the Eysenck Personality Questionnaire. They were instructed to verbally answer the questions shown on a computer screen and their responses were recorded by the computer. Spectrum parameters of /sh/ and /i/ were analyzed by Praat voice software. Formant frequencies of the consonant /sh/ in lying responses were significantly lower than that in truthful responses, whereas no difference existed on the vowel /i/ speech spectrum. The second formant bandwidth of the consonant /sh/ speech spectrum was significantly correlated with the personality traits of Psychoticism, Extraversion, and Neuroticism, and the correlation differed between truthful and lying responses, whereas the first formant frequency of the vowel /i/ speech spectrum was negatively correlated with Neuroticism in both response types. The results suggest that personality characteristics may be conveyed through the human voice, although the extent to which these effects are due to physiological differences in the organs associated with speech or to a general Pygmalion effect is yet unknown. PMID:22439014

  17. Mistaking minds and machines: How speech affects dehumanization and anthropomorphism.

    PubMed

    Schroeder, Juliana; Epley, Nicholas

    2016-11-01

    Treating a human mind like a machine is an essential component of dehumanization, whereas attributing a humanlike mind to a machine is an essential component of anthropomorphism. Here we tested how a cue closely connected to a person's actual mental experience-a humanlike voice-affects the likelihood of mistaking a person for a machine, or a machine for a person. We predicted that paralinguistic cues in speech are particularly likely to convey the presence of a humanlike mind, such that removing voice from communication (leaving only text) would increase the likelihood of mistaking the text's creator for a machine. Conversely, adding voice to a computer-generated script (resulting in speech) would increase the likelihood of mistaking the text's creator for a human. Four experiments confirmed these hypotheses, demonstrating that people are more likely to infer a human (vs. computer) creator when they hear a voice expressing thoughts than when they read the same thoughts in text. Adding human visual cues to text (i.e., seeing a person perform a script in a subtitled video clip), did not increase the likelihood of inferring a human creator compared with only reading text, suggesting that defining features of personhood may be conveyed more clearly in speech (Experiments 1 and 2). Removing the naturalistic paralinguistic cues that convey humanlike capacity for thinking and feeling, such as varied pace and intonation, eliminates the humanizing effect of speech (Experiment 4). We discuss implications for dehumanizing others through text-based media, and for anthropomorphizing machines through speech-based media. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  18. Speech-Like Rhythm in a Voiced and Voiceless Orangutan Call

    PubMed Central

    Lameira, Adriano R.; Hardus, Madeleine E.; Bartlett, Adrian M.; Shumaker, Robert W.; Wich, Serge A.; Menken, Steph B. J.

    2015-01-01

    The evolutionary origins of speech remain obscure. Recently, it was proposed that speech derived from monkey facial signals which exhibit a speech-like rhythm of ∼5 open-close lip cycles per second. In monkeys, these signals may also be vocalized, offering a plausible evolutionary stepping stone towards speech. Three essential predictions remain, however, to be tested to assess this hypothesis' validity; (i) Great apes, our closest relatives, should likewise produce 5Hz-rhythm signals, (ii) speech-like rhythm should involve calls articulatorily similar to consonants and vowels given that speech rhythm is the direct product of stringing together these two basic elements, and (iii) speech-like rhythm should be experience-based. Via cinematic analyses we demonstrate that an ex-entertainment orangutan produces two calls at a speech-like rhythm, coined “clicks” and “faux-speech.” Like voiceless consonants, clicks required no vocal fold action, but did involve independent manoeuvring over lips and tongue. In parallel to vowels, faux-speech showed harmonic and formant modulations, implying vocal fold and supralaryngeal action. This rhythm was several times faster than orangutan chewing rates, as observed in monkeys and humans. Critically, this rhythm was seven-fold faster, and contextually distinct, than any other known rhythmic calls described to date in the largest database of the orangutan repertoire ever assembled. The first two predictions advanced by this study are validated and, based on parsimony and exclusion of potential alternative explanations, initial support is given to the third prediction. Irrespectively of the putative origins of these calls and underlying mechanisms, our findings demonstrate irrevocably that great apes are not respiratorily, articulatorilly, or neurologically constrained for the production of consonant- and vowel-like calls at speech rhythm. Orangutan clicks and faux-speech confirm the importance of rhythmic speech antecedents

  19. Influence of Smartphones and Software on Acoustic Voice Measures

    PubMed Central

    GRILLO, ELIZABETH U.; BROSIOUS, JENNA N.; SORRELL, STACI L.; ANAND, SUPRAJA

    2016-01-01

    This study assessed the within-subject variability of voice measures captured using different recording devices (i.e., smartphones and head mounted microphone) and software programs (i.e., Analysis of Dysphonia in Speech and Voice (ADSV), Multi-dimensional Voice Program (MDVP), and Praat). Correlations between the software programs that calculated the voice measures were also analyzed. Results demonstrated no significant within-subject variability across devices and software and that some of the measures were highly correlated across software programs. The study suggests that certain smartphones may be appropriate to record daily voice measures representing the effects of vocal loading within individuals. In addition, even though different algorithms are used to compute voice measures across software programs, some of the programs and measures share a similar relationship. PMID:28775797

  20. Design of digital voice storage and playback system

    NASA Astrophysics Data System (ADS)

    Tang, Chao

    2018-03-01

    Based on STC89C52 chip, this paper presents a single chip microcomputer minimum system, which is used to realize the logic control of digital speech storage and playback system. Compared with the traditional tape voice recording system, the system has advantages of small size, low power consumption, The effective solution of traditional voice recording system is limited in the use of electronic and information processing.

  1. Emotional self-other voice processing in schizophrenia and its relationship with hallucinations: ERP evidence.

    PubMed

    Pinheiro, Ana P; Rezaii, Neguine; Rauber, Andréia; Nestor, Paul G; Spencer, Kevin M; Niznikiewicz, Margaret

    2017-09-01

    Abnormalities in self-other voice processing have been observed in schizophrenia, and may underlie the experience of hallucinations. More recent studies demonstrated that these impairments are enhanced for speech stimuli with negative content. Nonetheless, few studies probed the temporal dynamics of self versus nonself speech processing in schizophrenia and, particularly, the impact of semantic valence on self-other voice discrimination. In the current study, we examined these questions, and additionally probed whether impairments in these processes are associated with the experience of hallucinations. Fifteen schizophrenia patients and 16 healthy controls listened to 420 prerecorded adjectives differing in voice identity (self-generated [SGS] versus nonself speech [NSS]) and semantic valence (neutral, positive, and negative), while EEG data were recorded. The N1, P2, and late positive potential (LPP) ERP components were analyzed. ERP results revealed group differences in the interaction between voice identity and valence in the P2 and LPP components. Specifically, LPP amplitude was reduced in patients compared with healthy subjects for SGS and NSS with negative content. Further, auditory hallucinations severity was significantly predicted by LPP amplitude: the higher the SAPS "voices conversing" score, the larger the difference in LPP amplitude between negative and positive NSS. The absence of group differences in the N1 suggests that self-other voice processing abnormalities in schizophrenia are not primarily driven by disrupted sensory processing of voice acoustic information. The association between LPP amplitude and hallucination severity suggests that auditory hallucinations are associated with enhanced sustained attention to negative cues conveyed by a nonself voice. © 2017 Society for Psychophysiological Research.

  2. How do voice restoration methods affect the psychological status of patients after total laryngectomy?

    PubMed

    Saltürk, Z; Arslanoğlu, A; Özdemir, E; Yıldırım, G; Aydoğdu, İ; Kumral, T L; Berkiten, G; Atar, Y; Uyar, Y

    2016-03-01

    This study investigated the relationship between psychological well-being and different voice rehabilitation methods in total laryngectomy patients. The study enrolled 96 patients who underwent total laryngectomy. The patients were divided into three groups according to the voice rehabilitation method used: esophageal speech (24 patients); a tracheoesophageal fistula and Provox 2 voice prosthesis (57 patients); or an electrolarynx (15 patients). The participants were asked to complete the Turkish version of the Voice Handicap Index-10 (VHI-10) to assess voice problems. They were also asked to complete the Turkish version of the Perceived Stress Scale (PSS), and the Hospital Anxiety and Depression Scale (HADS). The test scores of the three groups were compared statistically. Patients who used esophageal speech had a mean VHI-10 score of 10.25 ± 3.22 versus 19.42 ± 5.56 and 17.60 ± 1.92 for the tracheoesophageal fistula and Provox 2 and electrolarynx groups respectively, reflecting better perception of their voice. They also had a PSS score of 11.38 ± 3.92, indicating that they felt less stressed in comparison with the tracheoesophageal fistula and Provox 2 and electrolarynx groups, which scored 18.84 ± 5.50 and 16.20 ± 3.49 respectively. The HADS scores of the groups were not different, indicating that the patients' anxiety and depression status did not vary. Patients who used esophageal speech perceived less stress and were less handicapped by their voice.

  3. Comparing live to recorded speech in training the perception of spectrally shifted noise-vocoded speech.

    PubMed

    Faulkner, Andrew; Rosen, Stuart; Green, Tim

    2012-10-01

    Two experimental groups were trained for 2 h with live or recorded speech that was noise-vocoded and spectrally shifted and was from the same text and talker. These two groups showed equivalent improvements in performance for vocoded and shifted sentences, and the group trained with recorded speech showed consistently greater improvements than untrained controls. Another group trained with unshifted noise-vocoded speech improved no more than untrained controls. Computer-based training thus appears at least as effective as labor-intensive live-voice training for improving the perception of spectrally shifted noise-vocoded speech, and by implication, for training of users of cochlear implants.

  4. A Joint Time-Frequency and Matrix Decomposition Feature Extraction Methodology for Pathological Voice Classification

    NASA Astrophysics Data System (ADS)

    Ghoraani, Behnaz; Krishnan, Sridhar

    2009-12-01

    The number of people affected by speech problems is increasing as the modern world places increasing demands on the human voice via mobile telephones, voice recognition software, and interpersonal verbal communications. In this paper, we propose a novel methodology for automatic pattern classification of pathological voices. The main contribution of this paper is extraction of meaningful and unique features using Adaptive time-frequency distribution (TFD) and nonnegative matrix factorization (NMF). We construct Adaptive TFD as an effective signal analysis domain to dynamically track the nonstationarity in the speech and utilize NMF as a matrix decomposition (MD) technique to quantify the constructed TFD. The proposed method extracts meaningful and unique features from the joint TFD of the speech, and automatically identifies and measures the abnormality of the signal. Depending on the abnormality measure of each signal, we classify the signal into normal or pathological. The proposed method is applied on the Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database which consists of 161 pathological and 51 normal speakers, and an overall classification accuracy of 98.6% was achieved.

  5. Measurement errors in voice-key naming latency for Hiragana.

    PubMed

    Yamada, Jun; Tamaoka, Katsuo

    2003-12-01

    This study makes explicit the limitations and possibilities of voice-key naming latency research on single hiragana symbols (a Japanese syllabic script) by examining three sets of voice-key naming data against Sakuma, Fushimi, and Tatsumi's 1997 speech-analyzer voice-waveform data. Analysis showed that voice-key measurement errors can be substantial in standard procedures as they may conceal the true effects of significant variables involved in hiragana-naming behavior. While one can avoid voice-key measurement errors to some extent by applying Sakuma, et al.'s deltas and by excluding initial phonemes which induce measurement errors, such errors may be ignored when test items are words and other higher-level linguistic materials.

  6. Muscle Weakness and Speech in Oculopharyngeal Muscular Dystrophy

    ERIC Educational Resources Information Center

    Neel, Amy T.; Palmer, Phyllis M.; Sprouls, Gwyneth; Morrison, Leslie

    2015-01-01

    Purpose: We documented speech and voice characteristics associated with oculopharyngeal muscular dystrophy (OPMD). Although it is a rare disease, OPMD offers the opportunity to study the impact of myopathic weakness on speech production in the absence of neurologic deficits in a relatively homogeneous group of speakers. Methods: Twelve individuals…

  7. When infants talk, infants listen: pre-babbling infants prefer listening to speech with infant vocal properties.

    PubMed

    Masapollo, Matthew; Polka, Linda; Ménard, Lucie

    2016-03-01

    To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre-babbling infants (at 4-6 months) prefer listening to vowel sounds with infant vocal properties over vowel sounds with adult vocal properties. A listening preference favoring infant vowels may derive from their higher voice pitch, which has been shown to attract infant attention in infant-directed speech (IDS). In addition, infants' nascent articulatory abilities may induce a bias favoring infant speech given that 4- to 6-month-olds are beginning to produce vowel sounds. We created infant and adult /i/ ('ee') vowels using a production-based synthesizer that simulates the act of speaking in talkers at different ages and then tested infants across four experiments using a sequential preferential listening task. The findings provide the first evidence that infants preferentially attend to vowel sounds with infant voice pitch and/or formants over vowel sounds with no infant-like vocal properties, supporting the view that infants' production abilities influence how they process infant speech. The findings with respect to voice pitch also reveal parallels between IDS and infant speech, raising new questions about the role of this speech register in infant development. Research exploring the underpinnings and impact of this perceptual bias can expand our understanding of infant language development. © 2015 John Wiley & Sons Ltd.

  8. Group climate in the voice therapy of patients with Parkinson's Disease.

    PubMed

    Diaféria, Giovana; Madazio, Glaucya; Pacheco, Claudia; Takaki, Patricia Barbarini; Behlau, Mara

    2017-09-04

    To verify the impact that group dynamics and coaching strategies have on the PD patients voice, speech and communication, as well as the group climate. 16 individuals with mild to moderate dysarthria due to the PD were divided into two groups: the CG (8 patients), submitted to traditional therapy with 12 regular therapy sessions plus 4 additional support sessions; and the EG (8 patients), submitted to traditional therapy with 12 regular therapy sessions plus 4 sessions with group dynamics and coaching strategies. The Living with Dysarthria questionnaire (LwD), the self-evaluation of voice, speech and communication, and the perceptual-auditory analysis of the vocal quality were assess in 3 moments: pre-traditional therapy (pre); post-traditional therapy (post 1); and post support sessions/coaching strategies (post 2); in post 1 and post 2 moments, the Group Climate Questionnaire (GCQ) was also applied. CG and EG showed an improvement in the LwD from pre to post 1 and post 2 moments. Voice self-evaluation was better for the EG - when pre was compared with post 2 and when post 1 was compared with post 2 - ranging from regular to very good; both groups presented improvement in the communication self-evaluation. The perceptual-auditory evaluation of the vocal quality was better for the EG in the post 1 moment. No difference was found for the GCQ; however, the EG presented lower avoidance scores in post 2. All patients showed improvement in the voice, speech and communication self-evaluation; EG showed lower avoidance scores, creating a more collaborative and propitious environment for speech therapy.

  9. Network Speech Systems Technology Program

    NASA Astrophysics Data System (ADS)

    Weinstein, C. J.

    1980-09-01

    This report documents work performed during FY 1980 on the DCA-sponsored Network Speech Systems Technology Program. The areas of work reported are: (1) communication systems studies in Demand-Assignment Multiple Access (DAMA), voice/data integration, and adaptive routing, in support of the evolving Defense Communications System (DCS) and Defense Switched Network (DSN); (2) a satellite/terrestrial integration design study including the functional design of voice and data interfaces to interconnect terrestrial and satellite network subsystems; and (3) voice-conferencing efforts dealing with support of the Secure Voice and Graphics Conferencing (SVGC) Test and Evaluation Program. Progress in definition and planning of experiments for the Experimental Integrated Switched Network (EISN) is detailed separately in an FY 80 Experiment Plan Supplement.

  10. Voice Controlled Wheelchair

    NASA Technical Reports Server (NTRS)

    1977-01-01

    Michael Condon, a quadraplegic from Pasadena, California, demonstrates the NASA-developed voice-controlled wheelchair and its manipulator, which can pick up packages, open doors, turn a TV knob, and perform a variety of other functions. A possible boon to paralyzed and other severely handicapped persons, the chair-manipulator system responds to 35 one-word voice commands, such as "go," "stop," "up," "down," "right," "left," "forward," "backward." The heart of the system is a voice-command analyzer which utilizes a minicomputer. Commands are taught I to the computer by the patient's repeating them a number of times; thereafter the analyzer recognizes commands only in the patient's particular speech pattern. The computer translates commands into electrical signals which activate appropriate motors and cause the desired motion of chair or manipulator. Based on teleoperator and robot technology for space-related programs, the voice-controlled system was developed by Jet Propulsion Laboratory under the joint sponsorship of NASA and the Veterans Administration. The wheelchair-manipulator has been tested at Rancho Los Amigos Hospital, Downey, California, and is being evaluated at the VA Prosthetics Center in New York City.

  11. External Validation of the Acoustic Voice Quality Index Version 03.01 With Extended Representativity.

    PubMed

    Barsties, Ben; Maryn, Youri

    2016-07-01

    The Acoustic Voice Quality Index (AVQI) is an objective method to quantify the severity of overall voice quality in concatenated continuous speech and sustained phonation segments. Recently, AVQI was successfully modified to be more representative and ecologically valid because the internal consistency of AVQI was balanced out through equal proportion of the 2 speech types. The present investigation aims to explore its external validation in a large data set. An expert panel of 12 speech-language therapists rated the voice quality of 1058 concatenated voice samples varying from normophonia to severe dysphonia. The Spearman rank-order correlation coefficients (r) were used to measure concurrent validity. The AVQI's diagnostic accuracy was evaluated with several estimates of its receiver operating characteristics (ROC). Finally, 8 of the 12 experts were chosen because of reliability criteria. A strong correlation was identified between AVQI and auditoryperceptual rating (r = 0.815, P = .000). It indicated that 66.4% of the auditory-perceptual rating's variation was explained by AVQI. Additionally, the ROC results showed again the best diagnostic outcome at a threshold of AVQI = 2.43. This study highlights external validation and diagnostic precision of the AVQI version 03.01 as a robust and ecologically valid measurement to objectify voice quality. © The Author(s) 2016.

  12. Using visible speech to train perception and production of speech for individuals with hearing loss.

    PubMed

    Massaro, Dominic W; Light, Joanna

    2004-04-01

    The main goal of this study was to implement a computer-animated talking head, Baldi, as a language tutor for speech perception and production for individuals with hearing loss. Baldi can speak slowly; illustrate articulation by making the skin transparent to reveal the tongue, teeth, and palate; and show supplementary articulatory features, such as vibration of the neck to show voicing and turbulent airflow to show frication. Seven students with hearing loss between the ages of 8 and 13 were trained for 6 hours across 21 weeks on 8 categories of segments (4 voiced vs. voiceless distinctions, 3 consonant cluster distinctions, and 1 fricative vs. affricate distinction). Training included practice at the segment and the word level. Perception and production improved for each of the 7 children. Speech production also generalized to new words not included in the training lessons. Finally, speech production deteriorated somewhat after 6 weeks without training, indicating that the training method rather than some other experience was responsible for the improvement that was found.

  13. Ultrasound applicability in Speech Language Pathology and Audiology.

    PubMed

    Barberena, Luciana da Silva; Brasil, Brunah de Castro; Melo, Roberta Michelon; Mezzomo, Carolina Lisbôa; Mota, Helena Bolli; Keske-Soares, Márcia

    2014-01-01

    To present recent studies that used the ultrasound in the fields of Speech Language Pathology and Audiology, which evidence possibilities of the applicability of this technique in different subareas. A bibliographic research was carried out in the PubMed database, using the keywords "ultrasonic," "speech," "phonetics," "Speech, Language and Hearing Sciences," "voice," "deglutition," and "myofunctional therapy," comprising some areas of Speech Language Pathology and Audiology Sciences. The keywords "ultrasound," "ultrasonography," "swallow," "orofacial myofunctional therapy," and "orofacial myology" were also used in the search. Studies in humans from the past 5 years were selected. In the preselection, duplicated studies, articles not fully available, and those that did not present direct relation between ultrasound and Speech Language Pathology and Audiology Sciences were discarded. The data were analyzed descriptively and classified subareas of Speech Language Pathology and Audiology Sciences. The following items were considered: purposes, participants, procedures, and results. We selected 12 articles for ultrasound versus speech/phonetics subarea, 5 for ultrasound versus voice, 1 for ultrasound versus muscles of mastication, and 10 for ultrasound versus swallow. Studies relating "ultrasound" and "Speech Language Pathology and Audiology Sciences" in the past 5 years were not found. Different studies on the use of ultrasound in Speech Language Pathology and Audiology Sciences were found. Each of them, according to its purpose, confirms new possibilities of the use of this instrument in the several subareas, aiming at a more accurate diagnosis and new evaluative and therapeutic possibilities.

  14. Is there an effect of dysphonic teachers' voices on children's processing of spoken language?

    PubMed

    Rogerson, Jemma; Dodd, Barbara

    2005-03-01

    There is a vast body of literature on the causes, prevalence, implications, and issues of vocal dysfunction in teachers. However, the educational effect of teacher vocal impairment is largely unknown. The purpose of this study was to investigate the effect of impaired voice quality on children's processing of spoken language. One hundred and seven children (age range, 9.2 to 10.6, mean 9.8, SD 3.76 months) listened to three video passages, one read in a control voice, one in a mild dysphonic voice, and one in a severe dysphonic voice. After each video passage, children were asked to answer six questions, with multiple-choice answers. The results indicated that children's perceptions of speech across the three voice qualities differed, regardless of gender, IQ, and school attended. Performance in the control voice passages was better than performance in the mild and severe dysphonic voice passages. No difference was found between performance in the mild and severe dysphonic voice passages, highlighting that any form of vocal impairment is detrimental to children's speech processing and is therefore likely to have a negative educational effect. These findings, in light of the high rate of vocal dysfunction in teachers, further support the implementation of specific voice care education for those in the teaching profession.

  15. Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus

    PubMed Central

    2017-01-01

    Cortex in and around the human posterior superior temporal sulcus (pSTS) is known to be critical for speech perception. The pSTS responds to both the visual modality (especially biological motion) and the auditory modality (especially human voices). Using fMRI in single subjects with no spatial smoothing, we show that visual and auditory selectivity are linked. Regions of the pSTS were identified that preferred visually presented moving mouths (presented in isolation or as part of a whole face) or moving eyes. Mouth-preferring regions responded strongly to voices and showed a significant preference for vocal compared with nonvocal sounds. In contrast, eye-preferring regions did not respond to either vocal or nonvocal sounds. The converse was also true: regions of the pSTS that showed a significant response to speech or preferred vocal to nonvocal sounds responded more strongly to visually presented mouths than eyes. These findings can be explained by environmental statistics. In natural environments, humans see visual mouth movements at the same time as they hear voices, while there is no auditory accompaniment to visual eye movements. The strength of a voxel's preference for visual mouth movements was strongly correlated with the magnitude of its auditory speech response and its preference for vocal sounds, suggesting that visual and auditory speech features are coded together in small populations of neurons within the pSTS. SIGNIFICANCE STATEMENT Humans interacting face to face make use of auditory cues from the talker's voice and visual cues from the talker's mouth to understand speech. The human posterior superior temporal sulcus (pSTS), a brain region known to be important for speech perception, is complex, with some regions responding to specific visual stimuli and others to specific auditory stimuli. Using BOLD fMRI, we show that the natural statistics of human speech, in which voices co-occur with mouth movements, are reflected in the neural architecture of

  16. Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus.

    PubMed

    Zhu, Lin L; Beauchamp, Michael S

    2017-03-08

    Cortex in and around the human posterior superior temporal sulcus (pSTS) is known to be critical for speech perception. The pSTS responds to both the visual modality (especially biological motion) and the auditory modality (especially human voices). Using fMRI in single subjects with no spatial smoothing, we show that visual and auditory selectivity are linked. Regions of the pSTS were identified that preferred visually presented moving mouths (presented in isolation or as part of a whole face) or moving eyes. Mouth-preferring regions responded strongly to voices and showed a significant preference for vocal compared with nonvocal sounds. In contrast, eye-preferring regions did not respond to either vocal or nonvocal sounds. The converse was also true: regions of the pSTS that showed a significant response to speech or preferred vocal to nonvocal sounds responded more strongly to visually presented mouths than eyes. These findings can be explained by environmental statistics. In natural environments, humans see visual mouth movements at the same time as they hear voices, while there is no auditory accompaniment to visual eye movements. The strength of a voxel's preference for visual mouth movements was strongly correlated with the magnitude of its auditory speech response and its preference for vocal sounds, suggesting that visual and auditory speech features are coded together in small populations of neurons within the pSTS. SIGNIFICANCE STATEMENT Humans interacting face to face make use of auditory cues from the talker's voice and visual cues from the talker's mouth to understand speech. The human posterior superior temporal sulcus (pSTS), a brain region known to be important for speech perception, is complex, with some regions responding to specific visual stimuli and others to specific auditory stimuli. Using BOLD fMRI, we show that the natural statistics of human speech, in which voices co-occur with mouth movements, are reflected in the neural architecture of

  17. Military and government applications of human-machine communication by voice.

    PubMed Central

    Weinstein, C J

    1995-01-01

    This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier's computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs. Images Fig. 1 Fig. 2 Fig. 3 Fig. 4 Fig. 5 Fig. 6 PMID:7479718

  18. Effects of voice-sparing cricotracheal resection on phonation in women.

    PubMed

    Tanner, Kristine; Dromey, Christopher; Berardi, Mark L; Mattei, Lisa M; Pierce, Jenny L; Wisco, Jonathan J; Hunter, Eric J; Smith, Marshall E

    2017-09-01

    Individuals with idiopathic subglottic stenosis (SGS) are at risk for voice disorders prior to and following surgical management. This study examined the nature and severity of voice disorders in patients with SGS before and after a revised cricotracheal resection (CTR) procedure designed to minimize adverse effects on voice function. Eleven women with idiopathic SGS provided presurgical and postsurgical audio recordings. Voice Handicap Index (VHI) scores were also collected. Cepstral, signal-to-noise, periodicity, and fundamental frequency (F 0 ) analyses were undertaken for connected speech and sustained vowel samples. Listeners made auditory-perceptual ratings of overall quality and monotonicity. Paired samples statistical analyses revealed that mean F 0 decreased from 215 Hz (standard deviation [SD] = 40 Hz) to 201 Hz (SD = 65 Hz) following surgery. In general, VHI scores decreased after surgery. Voice disorder severity based on the Cepstral Spectral Index of Dysphonia (KayPentax, Montvale, NJ) for sustained vowels decreased (improved) from 41 (SD = 41) to 25 (SD = 21) points; no change was observed for connected speech. Semitone SD (2.2 semitones) did not change from pre- to posttreatment. Auditory-perceptual ratings demonstrated similar results. These preliminary results indicate that this revised CTR procedure is promising in minimizing adverse voice effects while offering a longer-term surgical outcome for SGS. Further research is needed to determine causal factors for pretreatment voice disorders, as well as to optimize treatments in this population. 4. Laryngoscope, 127:2085-2092, 2017. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.

  19. Voice Therapy Techniques Adapted to Treatment of Habit Cough: A Pilot Study.

    ERIC Educational Resources Information Center

    Blager, Florence B.; And Others

    1988-01-01

    Individuals with long-standing habit cough having no organic basis can be successfully treated with a combination of psychotherapy and speech therapy. Techniques for speech therapy are adapted from those used with hyperfunctional voice disorders to fit this debilitating laryngeal disorder. (Author)

  20. Interactions between voice clinics and singing teachers: a report on the British Voice Association questionnaire to voice clinics in the UK.

    PubMed

    Davies, J; Anderson, S; Huchison, L; Stewart, G

    2007-01-01

    Singers with vocal problems are among patients who present at multidisciplinary voice clinics led by Ear Nose and Throat consultants and laryngologists or speech and language therapists. However, the development and care of the singing voice are also important responsibilities of singing teachers. We report here on the current extent and nature of interactions between voice clinics and singing teachers, based on data from a recent survey undertaken on behalf of the British Voice Association. A questionnaire was sent to all 103 voice clinics at National Health Service (NHS) hospitals in the UK. Responses were received and analysed from 42 currently active clinics. Eight (19%) clinics reported having a singing teacher as an active member of the team. They were all satisfied with the singing teacher's knowledge and expertise, which had been acquired by several different means. Of 32 clinics without a singing teacher regularly associated with the team, funding and difficulty of finding an appropriate singing voice expert (81% and 50%, respectively) were among the main reasons for their absence. There was an expressed requirement for more interaction between voice clinics and singing teachers, and 86% replied that they would find it useful to have a list of singing teachers in their area. On the matter of gaining expertise and training, 74% of the clinics replying would enable singing teachers to observe clinic sessions for experience and 21% were willing to assist in training them for clinic-associated work.

  1. How Our Own Speech Rate Influences Our Perception of Others

    ERIC Educational Resources Information Center

    Bosker, Hans Rutger

    2017-01-01

    In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects…

  2. Changes of some functional speech disorders after surgical correction of skeletal anterior open bite.

    PubMed

    Knez Ambrožič, Mojca; Hočevar Boltežar, Irena; Ihan Hren, Nataša

    2015-09-01

    Skeletal anterior open bite (AOB) or apertognathism is characterized by the absence of contact of the anterior teeth and affects articulation parameters, chewing, biting and voice quality. The treatment of AOB consists of orthognatic surgical procedures. The aim of this study was to evaluate the effects of treatment on voice quality, articulation and nasality in speech with respect to skeletal changes. The study was prospective; 15 patients with AOB were evaluated before and after surgery. Lateral cephalometric x-ray parameters (facial angle, interincisal distance, Wits appraisal) were measured to determine skeletal changes. Before surgery, nine patients still had articulation disorders despite speech therapy during childhood. The voice quality parameters were determined by acoustic analysis of the vowel sound /a/ (fundamental frequency-F0, jitter, shimmer). Spectral analysis of vowels /a/, /e/, /i/, /o/, /u/ was carried out by determining the mean frequency of the first (F1) and second (F2) formants. Nasality in speech was expressed as the ratio between the nasal and the oral sound energies during speech samples. After surgery, normalizations of facial skeletal parameters were observed in all patients, but no statistically significant changes in articulation and voice quality parameters occurred despite subjective observations of easier articulation. Any deterioration in velopharyngeal insufficiency was absent in all of the patients. In conclusion, the surgical treatment of skeletal AOB does not lead to deterioration in voice, resonance and articulation qualities. Despite surgical correction of the unfavourable skeletal situation of the speech apparatus, the pre-existing articulation disorder cannot improve without professional intervention.

  3. Hearing Lips and Seeing Voices: How Cortical Areas Supporting Speech Production Mediate Audiovisual Speech Perception

    PubMed Central

    Skipper, Jeremy I.; van Wassenhove, Virginie; Nusbaum, Howard C.; Small, Steven L.

    2009-01-01

    Observing a speaker’s mouth profoundly influences speech perception. For example, listeners perceive an “illusory” “ta” when the video of a face producing /ka/ is dubbed onto an audio /pa/. Here, we show how cortical areas supporting speech production mediate this illusory percept and audiovisual (AV) speech perception more generally. Specifically, cortical activity during AV speech perception occurs in many of the same areas that are active during speech production. We find that different perceptions of the same syllable and the perception of different syllables are associated with different distributions of activity in frontal motor areas involved in speech production. Activity patterns in these frontal motor areas resulting from the illusory “ta” percept are more similar to the activity patterns evoked by AV/ta/ than they are to patterns evoked by AV/pa/ or AV/ka/. In contrast to the activity in frontal motor areas, stimulus-evoked activity for the illusory “ta” in auditory and somatosensory areas and visual areas initially resembles activity evoked by AV/pa/ and AV/ka/, respectively. Ultimately, though, activity in these regions comes to resemble activity evoked by AV/ta/. Together, these results suggest that AV speech elicits in the listener a motor plan for the production of the phoneme that the speaker might have been attempting to produce, and that feedback in the form of efference copy from the motor system ultimately influences the phonetic interpretation. PMID:17218482

  4. Research and development of a versatile portable speech prosthesis

    NASA Technical Reports Server (NTRS)

    1981-01-01

    The Versatile Portable Speech Prosthesis (VPSP), a synthetic speech output communication aid for non-speaking people is described. It was intended initially for severely physically limited people with cerebral palsy who are in electric wheelchairs. Hence, it was designed to be placed on a wheelchair and powered from a wheelchair battery. It can easily be separated from the wheelchair. The VPSP is versatile because it is designed to accept any means of single switch, multiple switch, or keyboard control which physically limited people have the ability to use. It is portable because it is mounted on and can go with the electric wheelchair. It is a speech prosthesis, obviously, because it speaks with a synthetic voice for people unable to speak with their own voices. Both hardware and software are described.

  5. Voice - How humans communicate?

    PubMed

    Tiwari, Manjul; Tiwari, Maneesha

    2012-01-01

    Voices are important things for humans. They are the medium through which we do a lot of communicating with the outside world: our ideas, of course, and also our emotions and our personality. The voice is the very emblem of the speaker, indelibly woven into the fabric of speech. In this sense, each of our utterances of spoken language carries not only its own message but also, through accent, tone of voice and habitual voice quality it is at the same time an audible declaration of our membership of particular social regional groups, of our individual physical and psychological identity, and of our momentary mood. Voices are also one of the media through which we (successfully, most of the time) recognize other humans who are important to us-members of our family, media personalities, our friends, and enemies. Although evidence from DNA analysis is potentially vastly more eloquent in its power than evidence from voices, DNA cannot talk. It cannot be recorded planning, carrying out or confessing to a crime. It cannot be so apparently directly incriminating. As will quickly become evident, voices are extremely complex things, and some of the inherent limitations of the forensic-phonetic method are in part a consequence of the interaction between their complexity and the real world in which they are used. It is one of the aims of this article to explain how this comes about. This subject have unsolved questions, but there is no direct way to present the information that is necessary to understand how voices can be related, or not, to their owners.

  6. Transcribing nonsense words: The effect of numbers of voices and repetitions.

    PubMed

    Knight, Rachael-Anne

    2010-06-01

    Transcription skills are crucially important to all phoneticians, and particularly for speech and language therapists who may use transcriptions to make decisions about diagnosis and intervention. Whilst interest in factors affecting transcription accuracy is increasing, there are still a number of issues that are yet to be investigated. The present paper considers how the number of voices and the number of repetitions affects the transcription of nonsense words. Thirty-two students in their second year of study for a BSc in Speech and Language Therapy were participants in an experiment. They heard two nonsense words presented 10 times in either one or two voices. Results show that the number of voices did not affect accuracy, but that accuracy increased between six and ten repetitions. The reasons behind these findings, and implications for teaching and learning, and further research are discussed.

  7. Common cues to emotion in the dynamic facial expressions of speech and song

    PubMed Central

    Livingstone, Steven R.; Thompson, William F.; Wanderley, Marcelo M.; Palmer, Caroline

    2015-01-01

    Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech–song differences. Vocalists’ jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech–song. Vocalists’ emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists’ facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production. PMID:25424388

  8. Objective and subjective assessment of tracheoesophageal prosthesis voice outcome.

    PubMed

    D'Alatri, Lucia; Bussu, Francesco; Scarano, Emanuele; Paludetti, Gaetano; Marchese, Maria Raffaella

    2012-09-01

    To investigate the relationships between objective measures and the results of subjective assessment of voice quality and speech intelligibility in patients submitted to total laryngectomy and tracheoesophageal (TE) puncture. Retrospective. Twenty patients implanted with voice prosthesis were studied. After surgery, the entire sample performed speech rehabilitation. The assessment protocol included maximum phonation time (MPT), number of syllables per deep breath, acoustic analysis of the sustained vowel /a/ and of a bisyllabic word, perceptual evaluation (pleasantness and intelligibility%), and self-assessment. The correlation between pleasantness and intelligibility% was statistically significant. Both the latter were significantly correlated with the acoustic signal type, the number of formant peaks, and the F2-F1 difference. The intelligibility% and number of formant peaks were significantly correlated with the MPT and number of syllables per deep breath. Moreover, significant correlations were found between the number of formant peaks and both intelligibility% and pleasantness. The higher the number of syllables per deep breath and the longer the MPT, significantly higher was the number of formant peaks and the intelligibility%. The study failed to show significant correlation between patient's self-assessment of voice quality and both pleasantness and communication effectiveness. The multidimensional assessment seems to be a reliable tool to evaluate the TE functional outcome. Particularly, the results showed that both pleasantness and intelligibility of TE speech are correlated to the availability of expired air and the function of the vocal tract. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  9. Monkeys and Humans Share a Common Computation for Face/Voice Integration

    PubMed Central

    Chandrasekaran, Chandramouli; Lemus, Luis; Trubanova, Andrea; Gondan, Matthias; Ghazanfar, Asif A.

    2011-01-01

    Speech production involves the movement of the mouth and other regions of the face resulting in visual motion cues. These visual cues enhance intelligibility and detection of auditory speech. As such, face-to-face speech is fundamentally a multisensory phenomenon. If speech is fundamentally multisensory, it should be reflected in the evolution of vocal communication: similar behavioral effects should be observed in other primates. Old World monkeys share with humans vocal production biomechanics and communicate face-to-face with vocalizations. It is unknown, however, if they, too, combine faces and voices to enhance their perception of vocalizations. We show that they do: monkeys combine faces and voices in noisy environments to enhance their detection of vocalizations. Their behavior parallels that of humans performing an identical task. We explored what common computational mechanism(s) could explain the pattern of results we observed across species. Standard explanations or models such as the principle of inverse effectiveness and a “race” model failed to account for their behavior patterns. Conversely, a “superposition model”, positing the linear summation of activity patterns in response to visual and auditory components of vocalizations, served as a straightforward but powerful explanatory mechanism for the observed behaviors in both species. As such, it represents a putative homologous mechanism for integrating faces and voices across primates. PMID:21998576

  10. Effects of syllable-initial voicing and speaking rate on the temporal characteristics of monosyllabic words.

    PubMed

    Allen, J S; Miller, J L

    1999-10-01

    Two speech production experiments tested the validity of the traditional method of creating voice-onset-time (VOT) continua for perceptual studies in which the systematic increase in VOT across the continuum is accompanied by a concomitant decrease in the duration of the following vowel. In experiment 1, segmental durations were measured for matched monosyllabic words beginning with either a voiced stop (e.g., big, duck, gap) or a voiceless stop (e.g., pig, tuck, cap). Results from four talkers showed that the change from voiced to voiceless stop produced not only an increase in VOT, but also a decrease in vowel duration. However, the decrease in vowel duration was consistently less than the increase in VOT. In experiment 2, results from four new talkers replicated these findings at two rates of speech, as well as highlighted the contrasting temporal effects on vowel duration of an increase in VOT due to a change in syllable-initial voicing versus a change in speaking rate. It was concluded that the traditional method of creating VOT continua for perceptual experiments, although not perfect, approximates natural speech by capturing the basic trade-off between VOT and vowel duration in syllable-initial voiced versus voiceless stop consonants.

  11. Voice synthesis application

    NASA Astrophysics Data System (ADS)

    Lightstone, P. C.; Davidson, W. M.

    1982-04-01

    The military detection assessment laboratory houses an experimental field system which assesses different alarm indicators such as fence disturbance sensors, MILES cables, and microwave Racons. A speech synthesis board which could be interfaced, by means of a computer, to an alarm logger making verbal acknowledgement of alarms possible was purchased. Different products and different types of voice synthesis were analyzed before a linear predictive code device produced by Telesensory Speech Systems of Palo Alto, California was chosen. This device is called the Speech 1000 Board and has a dedicated 8085 processor. A multiplexer card was designed and the Sp 1000 interfaced through the card into a TMS 990/100M Texas Instrument microcomputer. It was also necessary to design the software with the capability of recognizing and flagging an alarm on any 1 of 32 possible lines. The experimental field system was then packaged with a dc power supply, LED indicators, speakers, and switches, and deployed in the field performing reliably.

  12. Perceptual and Acoustic Reliability Estimates for the Speech Disorders Classification System (SDCS)

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…

  13. Objective measurement of motor speech characteristics in the healthy pediatric population.

    PubMed

    Wong, A W; Allegro, J; Tirado, Y; Chadha, N; Campisi, P

    2011-12-01

    To obtain objective measurements of motor speech characteristics in normal children, using a computer-based motor speech software program. Cross-sectional, observational design in a university-based ambulatory pediatric otolaryngology clinic. Participants included 112 subjects (54 females and 58 males) aged 4-18 years. Participants with previously diagnosed hearing loss, voice and motor disorders, and children unable to repeat a passage in English were excluded. Voice samples were recorded and analysed using the Motor Speech Profile (MSP) software (KayPENTAX, Lincoln Park, NJ). The MSP produced measures of diadochokinetics, second formant transition, intonation, and syllabic rates. Demographic data, including sex, age, and cigarette smoke exposure were obtained. Normative data for several motor speech characteristics were derived for children ranging from age 4 to 18 years. A number of age-dependent changes were indentified, including an increase in average diadochokinetic rate (p<0.001) and standard syllabic duration (p<0.001) with age. There were no identified differences in motor speech characteristics between males and females across the measured age range. Variations in fundamental frequency (Fo) during speech did not change significantly with age for both males and females. To our knowledge, this is the first pediatric normative database for the MSP progam. The MSP is suitable for testing children and can be used to study developmental changes in motor speech. The analysis demonstrated that males and females behave similarly and show the same relationship with age for the motor speech characteristics studied. This normative database will provide essential comparative data for future studies exploring alterations in motor speech that may occur with hearing, voice, and motor disorders and to assess the results of targeted therapies. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  14. Cognitive processing load during listening is reduced more by decreasing voice similarity than by increasing spatial separation between target and masker speech.

    PubMed

    Zekveld, Adriana A; Rudner, Mary; Kramer, Sophia E; Lyzenga, Johannes; Rönnberg, Jerker

    2014-01-01

    We investigated changes in speech recognition and cognitive processing load due to the masking release attributable to decreasing similarity between target and masker speech. This was achieved by using masker voices with either the same (female) gender as the target speech or different gender (male) and/or by spatially separating the target and masker speech using HRTFs. We assessed the relation between the signal-to-noise ratio required for 50% sentence intelligibility, the pupil response and cognitive abilities. We hypothesized that the pupil response, a measure of cognitive processing load, would be larger for co-located maskers and for same-gender compared to different-gender maskers. We further expected that better cognitive abilities would be associated with better speech perception and larger pupil responses as the allocation of larger capacity may result in more intense mental processing. In line with previous studies, the performance benefit from different-gender compared to same-gender maskers was larger for co-located masker signals. The performance benefit of spatially-separated maskers was larger for same-gender maskers. The pupil response was larger for same-gender than for different-gender maskers, but was not reduced by spatial separation. We observed associations between better perception performance and better working memory, better information updating, and better executive abilities when applying no corrections for multiple comparisons. The pupil response was not associated with cognitive abilities. Thus, although both gender and location differences between target and masker facilitate speech perception, only gender differences lower cognitive processing load. Presenting a more dissimilar masker may facilitate target-masker separation at a later (cognitive) processing stage than increasing the spatial separation between the target and masker. The pupil response provides information about speech perception that complements intelligibility data.

  15. Cognitive processing load during listening is reduced more by decreasing voice similarity than by increasing spatial separation between target and masker speech

    PubMed Central

    Zekveld, Adriana A.; Rudner, Mary; Kramer, Sophia E.; Lyzenga, Johannes; Rönnberg, Jerker

    2014-01-01

    We investigated changes in speech recognition and cognitive processing load due to the masking release attributable to decreasing similarity between target and masker speech. This was achieved by using masker voices with either the same (female) gender as the target speech or different gender (male) and/or by spatially separating the target and masker speech using HRTFs. We assessed the relation between the signal-to-noise ratio required for 50% sentence intelligibility, the pupil response and cognitive abilities. We hypothesized that the pupil response, a measure of cognitive processing load, would be larger for co-located maskers and for same-gender compared to different-gender maskers. We further expected that better cognitive abilities would be associated with better speech perception and larger pupil responses as the allocation of larger capacity may result in more intense mental processing. In line with previous studies, the performance benefit from different-gender compared to same-gender maskers was larger for co-located masker signals. The performance benefit of spatially-separated maskers was larger for same-gender maskers. The pupil response was larger for same-gender than for different-gender maskers, but was not reduced by spatial separation. We observed associations between better perception performance and better working memory, better information updating, and better executive abilities when applying no corrections for multiple comparisons. The pupil response was not associated with cognitive abilities. Thus, although both gender and location differences between target and masker facilitate speech perception, only gender differences lower cognitive processing load. Presenting a more dissimilar masker may facilitate target-masker separation at a later (cognitive) processing stage than increasing the spatial separation between the target and masker. The pupil response provides information about speech perception that complements intelligibility data

  16. Changes in objective acoustic measurements and subjective voice complaints in call center customer-service advisors during one working day.

    PubMed

    Lehto, Laura; Laaksonen, Laura; Vilkman, Erkki; Alku, Paavo

    2008-03-01

    The aim of this study was to investigate how different acoustic parameters, extracted both from speech pressure waveforms and glottal flows, can be used in measuring vocal loading in modern working environments and how these parameters reflect the possible changes in the vocal function during a working day. In addition, correlations between objective acoustic parameters and subjective voice symptoms were addressed. The subjects were 24 female and 8 male customer-service advisors, who mainly use telephone during their working hours. Speech samples were recorded from continuous speech four times during a working day and voice symptom questionnaires were completed simultaneously. Among the various objective parameters, only F0 resulted in a statistically significant increase for both genders. No correlations between the changes in objective and subjective parameters appeared. However, the results encourage researchers within the field of occupational voice use to apply versatile measurement techniques in studying occupational voice loading.

  17. Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants.

    PubMed

    Hoy, Matthew B

    2018-01-01

    Voice assistants are software agents that can interpret human speech and respond via synthesized voices. Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google's Assistant are the most popular voice assistants and are embedded in smartphones or dedicated home speakers. Users can ask their assistants questions, control home automation devices and media playback via voice, and manage other basic tasks such as email, to-do lists, and calendars with verbal commands. This column will explore the basic workings and common features of today's voice assistants. It will also discuss some of the privacy and security issues inherent to voice assistants and some potential future uses for these devices. As voice assistants become more widely used, librarians will want to be familiar with their operation and perhaps consider them as a means to deliver library services and materials.

  18. Acoustic-Perceptual Correlates of Voice in Indian Hindu Purohits.

    PubMed

    Balasubramanium, Radish Kumar; Karuppali, Sudhin; Bajaj, Gagan; Shastry, Anuradha; Bhat, Jayashree

    2018-05-16

    Purohit, in the Indian religious context (Hindu), means priest. Purohits are professional voice users who use their voice while performing regular worships and rituals in temples and homes. Any deviations in their voice can have an impact on their profession. Hence, there is a need to investigate the voice characteristics of purohits using perceptual and acoustic analyses. A total of 44 men in the age range of 18-30 years were divided into two groups. Group 1 consisted of purohits who were trained since childhood (n = 22) in the traditional gurukul system. Group 2 (n = 22) consisted of normal controls. Phonation and spontaneous speech samples were obtained from all the participants at a comfortable pitch and loudness. The Praat software (Version 5.3.31) and the Speech tool were used to analyze the traditional acoustic and cepstral parameters, respectively, whereas GRBAS was used to perceptually evaluate the voice. Results of the independent t test revealed no significant differences across the groups for perceptual and traditional acoustic measures except for intensity, which was significantly higher in purohits' voices at P < 0.05. However, the cepstral values (cepstral peak prominence and smoothened cepstral peak prominence) were much higher in purohits than in controls at P < 0.05 CONCLUSIONS: Results revealed that purohits did not exhibit vocal deviations as analyzed through perceptual and acoustic parameters. In contrast, cepstral measures were higher in Indian Hindu purohits in comparison with normal controls, suggestive of a higher degree of harmonic organization in purohits. Further studies are required to analyze the physiological correlates of increased cepstral measures in purohits' voices. Copyright © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  19. Auditory-Perceptual and Acoustic Methods in Measuring Dysphonia Severity of Korean Speech.

    PubMed

    Maryn, Youri; Kim, Hyung-Tae; Kim, Jaeock

    2016-09-01

    The purpose of this study was to explore the criterion-related concurrent validity of two standardized auditory-perceptual rating protocols and the Acoustic Voice Quality Index (AVQI) for measuring dysphonia severity in Korean speech. Sixty native Korean subjects with various voice disorders were asked to sustain the vowel [a:] and to read aloud the Korean text "Walk." A 3-second midvowel portion of the sustained vowel and two sentences (with 25 syllables) were edited, concatenated, and analyzed according to methods described elsewhere. From 56 participants, both continuous speech and sustained vowel recordings had sufficiently high signal-to-noise ratios (35.5 dB and 37 dB on average, respectively) and were therefore subjected to further dysphonia severity analysis with (1) "G" or Grade from the GRBAS protocol, (2) "OS" or Overall Severity from the Consensus Auditory-Perceptual Evaluation of Voice protocol, and (3) AVQI. First, high correlations were found between G and OS (rS = 0.955 for sustained vowels; rS = 0.965 for continuous speech). Second, the AVQI showed a strong correlation with G (rS = 0.911) as well as OS (rP = 0.924). These findings are in agreement with similar studies dealing with continuous speech in other languages. The present study highlights the criterion-related concurrent validity of these methods in Korean speech. Furthermore, it supports the cross-linguistic robustness of the AVQI as a valid and objective marker of overall dysphonia severity. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  20. Effects of Familiarity and Feeding on Newborn Speech-Voice Recognition

    ERIC Educational Resources Information Center

    Valiante, A. Grace; Barr, Ronald G.; Zelazo, Philip R.; Brant, Rollin; Young, Simon N.

    2013-01-01

    Newborn infants preferentially orient to familiar over unfamiliar speech sounds. They are also better at remembering unfamiliar speech sounds for short periods of time if learning and retention occur after a feed than before. It is unknown whether short-term memory for speech is enhanced when the sound is familiar (versus unfamiliar) and, if so,…

  1. Childhood apraxia of speech: A survey of praxis and typical speech characteristics.

    PubMed

    Malmenholt, Ann; Lohmander, Anette; McAllister, Anita

    2017-07-01

    The purpose of this study was to investigate current knowledge of the diagnosis childhood apraxia of speech (CAS) in Sweden and compare speech characteristics and symptoms to those of earlier survey findings in mainly English-speakers. In a web-based questionnaire 178 Swedish speech-language pathologists (SLPs) anonymously answered questions about their perception of typical speech characteristics for CAS. They graded own assessment skills and estimated clinical occurrence. The seven top speech characteristics reported as typical for children with CAS were: inconsistent speech production (85%), sequencing difficulties (71%), oro-motor deficits (63%), vowel errors (62%), voicing errors (61%), consonant cluster deletions (54%), and prosodic disturbance (53%). Motor-programming deficits described as lack of automatization of speech movements were perceived by 82%. All listed characteristics were consistent with the American Speech-Language-Hearing Association (ASHA) consensus-based features, Strand's 10-point checklist, and the diagnostic model proposed by Ozanne. The mode for clinical occurrence was 5%. Number of suspected cases of CAS in the clinical caseload was approximately one new patient/year and SLP. The results support and add to findings from studies of CAS in English-speaking children with similar speech characteristics regarded as typical. Possibly, these findings could contribute to cross-linguistic consensus on CAS characteristics.

  2. Voice and persuasion in a banking telemarketing context.

    PubMed

    Chebat, Jean-Charles; El Hedhli, Kamel; Gélinas-Chebat, Claire; Boivin, Robert

    2007-04-01

    Voice has been neglected in research on advertising and persuasion. The present study examined the influence of voice and sex on the credibility of the voice source in a banking telemarketing context as well as with regards to the attitude toward the advertisement, and subjects' behavioral intention. An experiment using voices of a man and a woman was conducted. A recorded mock-telemarketing message consisted of an advertisement for an ATM card offered by a Canadian bank. Subjects were undergraduate students (N=399; 71.6% women, 28.4% men; M age=26.5 yr., SD = 7.4). They completed a questionnaire after hearing the message in telemarketing conditions. Analysis indicated a moderate intensity, an unmarked intonation, and a fast speech rate are associated with a more credible source than the other combinations. Sex was not a significant moderator in the relationship between voice characteristics and source credibility. Voice characteristics significantly affected attitudes toward the advertisement and behavioral intention.

  3. Acoustic analysis of voice in children with cleft palate and velopharyngeal insufficiency.

    PubMed

    Villafuerte-Gonzalez, Rocio; Valadez-Jimenez, Victor M; Hernandez-Lopez, Xochiquetzal; Ysunza, Pablo Antonio

    2015-07-01

    Acoustic analysis of voice can provide instrumental data concerning vocal abnormalities. These findings can be used for monitoring clinical course in cases of voice disorders. Cleft palate severely affects the structure of the vocal tract. Hence, voice quality can also be also affected. To study whether the main acoustic parameters of voice, including fundamental frequency, shimmer and jitter are significantly different in patients with a repaired cleft palate, as compared with normal children without speech, language and voice disorders. Fourteen patients with repaired unilateral cleft lip and palate and persistent or residual velopharyngeal insufficiency (VPI) were studied. A control group was assembled with healthy volunteer subjects matched by age and gender. Hypernasality and nasal emission were perceptually assessed in patients with VPI. Size of the gap as assessed by videonasopharyngoscopy was classified in patients with VPI. Acoustic analysis of voice including Fundamental frequency (F0), shimmer and jitter were compared between patients with VPI and control subjects. F0 was significantly higher in male patients as compared with male controls. Shimmer was significantly higher in patients with VPI regardless of gender. Moreover, patients with moderate VPI showed a significantly higher shimmer perturbation, regardless of gender. Although future research regarding voice disorders in patients with VPI is needed, at the present time it seems reasonable to include strategies for voice therapy in the speech and language pathology intervention plan for patients with VPI. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  4. Performance of wavelet analysis and neural networks for pathological voices identification

    NASA Astrophysics Data System (ADS)

    Salhi, Lotfi; Talbi, Mourad; Abid, Sabeur; Cherif, Adnane

    2011-09-01

    Within the medical environment, diverse techniques exist to assess the state of the voice of the patient. The inspection technique is inconvenient for a number of reasons, such as its high cost, the duration of the inspection, and above all, the fact that it is an invasive technique. This study focuses on a robust, rapid and accurate system for automatic identification of pathological voices. This system employs non-invasive, non-expensive and fully automated method based on hybrid approach: wavelet transform analysis and neural network classifier. First, we present the results obtained in our previous study while using classic feature parameters. These results allow visual identification of pathological voices. Second, quantified parameters drifting from the wavelet analysis are proposed to characterise the speech sample. On the other hand, a system of multilayer neural networks (MNNs) has been developed which carries out the automatic detection of pathological voices. The developed method was evaluated using voice database composed of recorded voice samples (continuous speech) from normophonic or dysphonic speakers. The dysphonic speakers were patients of a National Hospital 'RABTA' of Tunis Tunisia and a University Hospital in Brussels, Belgium. Experimental results indicate a success rate ranging between 75% and 98.61% for discrimination of normal and pathological voices using the proposed parameters and neural network classifier. We also compared the average classification rate based on the MNN, Gaussian mixture model and support vector machines.

  5. A voice region in the monkey brain.

    PubMed

    Petkov, Christopher I; Kayser, Christoph; Steudel, Thomas; Whittingstall, Kevin; Augath, Mark; Logothetis, Nikos K

    2008-03-01

    For vocal animals, recognizing species-specific vocalizations is important for survival and social interactions. In humans, a voice region has been identified that is sensitive to human voices and vocalizations. As this region also strongly responds to speech, it is unclear whether it is tightly associated with linguistic processing and is thus unique to humans. Using functional magnetic resonance imaging of macaque monkeys (Old World primates, Macaca mulatta) we discovered a high-level auditory region that prefers species-specific vocalizations over other vocalizations and sounds. This region not only showed sensitivity to the 'voice' of the species, but also to the vocal identify of conspecific individuals. The monkey voice region is located on the superior-temporal plane and belongs to an anterior auditory 'what' pathway. These results establish functional relationships with the human voice region and support the notion that, for different primate species, the anterior temporal regions of the brain are adapted for recognizing communication signals from conspecifics.

  6. Speech & Language Impairments. NICHCY Disability Fact Sheet #11

    ERIC Educational Resources Information Center

    National Dissemination Center for Children with Disabilities, 2011

    2011-01-01

    There are many kinds of speech and language disorders that can affect children. This fact sheet will present four major areas in which these impairments occur. These are the areas of: (1) Articulation; (2) Fluency; (3) Voice; and (4) Language. Following a brief narrative on a day in the life of a Speech Language Pathologist, this fact sheet…

  7. English Voicing in Dimensional Theory*

    PubMed Central

    Iverson, Gregory K.; Ahn, Sang-Cheol

    2007-01-01

    Assuming a framework of privative features, this paper interprets two apparently disparate phenomena in English phonology as structurally related: the lexically specific voicing of fricatives in plural nouns like wives or thieves and the prosodically governed “flapping” of medial /t/ (and /d/) in North American varieties, which we claim is itself not a rule per se, but rather a consequence of the laryngeal weakening of fortis /t/ in interaction with speech-rate determined segmental abbreviation. Taking as our point of departure the Dimensional Theory of laryngeal representation developed by Avery & Idsardi (2001), along with their assumption that English marks voiceless obstruents but not voiced ones (Iverson & Salmons 1995), we find that an unexpected connection between fricative voicing and coronal flapping emerges from the interplay of familiar phonemic and phonetic factors in the phonological system. PMID:18496590

  8. Speech impairment in Down syndrome: a review.

    PubMed

    Kent, Ray D; Vorperian, Houri K

    2013-02-01

    This review summarizes research on disorders of speech production in Down syndrome (DS) for the purposes of informing clinical services and guiding future research. Review of the literature was based on searches using MEDLINE, Google Scholar, PsycINFO, and HighWire Press, as well as consideration of reference lists in retrieved documents (including online sources). Search terms emphasized functions related to voice, articulation, phonology, prosody, fluency, and intelligibility. The following conclusions pertain to four major areas of review: voice, speech sounds, fluency and prosody, and intelligibility. The first major area is voice. Although a number of studies have reported on vocal abnormalities in DS, major questions remain about the nature and frequency of the phonatory disorder. Results of perceptual and acoustic studies have been mixed, making it difficult to draw firm conclusions or even to identify sensitive measures for future study. The second major area is speech sounds. Articulatory and phonological studies show that speech patterns in DS are a combination of delayed development and errors not seen in typical development. Delayed (i.e., developmental) and disordered (i.e., nondevelopmental) patterns are evident by the age of about 3 years, although DS-related abnormalities possibly appear earlier, even in infant babbling. The third major area is fluency and prosody. Stuttering and/or cluttering occur in DS at rates of 10%-45%, compared with about 1% in the general population. Research also points to significant disturbances in prosody. The fourth major area is intelligibility. Studies consistently show marked limitations in this area, but only recently has the research gone beyond simple rating scales.

  9. Voice recognition products-an occupational risk for users with ULDs?

    PubMed

    Williams, N R

    2003-10-01

    Voice recognition systems (VRS) allow speech to be converted both directly into text-which appears on the screen of a computer-and to direct equipment to perform specific functions. Suggested applications are many and varied, including increasing efficiency in the reporting of radiographs, allowing directed surgery and enabling individuals with upper limb disorders (ULDs) who cannot use other input devices, such as keyboards and mice, to carry out word processing and other activities. Aim This paper describes four cases of vocal dysfunction related to the use of such software, which have been identified from the database of the Voice and Speech Laboratory of the Massachusetts Eye and Ear infirmary (MEEI). The database was searched using key words 'voice recognition' and four cases were identified from a total of 4800. In all cases, the VRS was supplied to assist individuals with ULDs who could not use conventional input devices. Case reports illustrate time of onset and symptoms experienced. The cases illustrate the need for risk assessment and consideration of the ergonomic aspects of voice use prior to such adaptations being used, particularly in those who already experience work-related ULDs.

  10. The prevalence of speech disorder in primary school students in Yazd-Iran.

    PubMed

    Karbasi, Sedighah Akhavan; Fallah, Razieh; Golestan, Motaharah

    2011-01-01

    Communication disorder is a widespread disabling problems and associated with adverse, long term outcome that impact on individuals, families and academic achievement of children in the school years and affect vocational choices later in adulthood. The aim of this study was to determine prevalence of speech disorders specifically stuttering, voice, and speech-sound disorders in primary school students in Iran-Yazd. In a descriptive study, 7881 primary school students in Yazd evaluated in view from of speech disorders with use of direct and face to face assessment technique in 2005. The prevalence of total speech disorders was 14.8% among whom 13.8% had speech-sound disorder, 1.2% stuttering and 0.47% voice disorder. The prevalence of speech disorders was higher than in males (16.7%) as compared to females (12.7%). Pattern of prevalence of the three speech disorders was significantly different according to gender, parental education and by number of family member. There was no significant difference across speech disorders and birth order, religion and paternal consanguinity. These prevalence figures are higher than more studies that using parent or teacher reports.

  11. An integrated tool for the diagnosis of voice disorders.

    PubMed

    Godino-Llorente, Juan I; Sáenz-Lechón, Nicolás; Osma-Ruiz, Víctor; Aguilera-Navarro, Santiago; Gómez-Vilda, Pedro

    2006-04-01

    A PC-based integrated aid tool has been developed for the analysis and screening of pathological voices. With it the user can simultaneously record speech, electroglottographic (EGG), and videoendoscopic signals, and synchronously edit them to select the most significant segments. These multimedia data are stored on a relational database, together with a patient's personal information, anamnesis, diagnosis, visits, explorations and any other comment the specialist may wish to include. The speech and EGG waveforms are analysed by means of temporal representations and the quantitative measurements of parameters such as spectrograms, frequency and amplitude perturbation measurements, harmonic energy, noise, etc. are calculated using digital signal processing techniques, giving an idea of the degree of hoarseness and quality of the voice register. Within this framework, the system uses a standard protocol to evaluate and build complete databases of voice disorders. The target users of this system are speech and language therapists and ear nose and throat (ENT) clinicians. The application can be easily configured to cover the needs of both groups of professionals. The software has a user-friendly Windows style interface. The PC should be equipped with standard sound and video capture cards. Signals are captured using common transducers: a microphone, an electroglottograph and a fiberscope or telelaryngoscope. The clinical usefulness of the system is addressed in a comprehensive evaluation section.

  12. A perspective on early commercial applications of voice-processing technology for telecommunications and aids for the handicapped.

    PubMed Central

    Seelbach, C

    1995-01-01

    The Colloquium on Human-Machine Communication by Voice highlighted the global technical community's focus on the problems and promise of voice-processing technology, particularly, speech recognition and speech synthesis. Clearly, there are many areas in both the research and development of these technologies that can be advanced significantly. However, it is also true that there are many applications of these technologies that are capable of commercialization now. Early successful commercialization of new technology is vital to ensure continuing interest in its development. This paper addresses efforts to commercialize speech technologies in two markets: telecommunications and aids for the handicapped. PMID:7479814

  13. Intelligibility and Acceptability Testing for Speech Technology

    DTIC Science & Technology

    1992-05-22

    information in memory (Luce, Feustel, and Pisoni, 1983). In high workload or multiple task situations, the added effort of listening to degraded speech can lead...the DRT provides diagnostic feature scores on six phonemic features: voicing, nasality, sustention , sibilation, graveness, and compactness, and on a...of other speech materials (e.g., polysyllabic words, paragraphs) and methods ( memory , comprehension, reaction time) have been used to evaluate the

  14. Underconnectivity between voice-selective cortex and reward circuitry in children with autism.

    PubMed

    Abrams, Daniel A; Lynch, Charles J; Cheng, Katherine M; Phillips, Jennifer; Supekar, Kaustubh; Ryali, Srikanth; Uddin, Lucina Q; Menon, Vinod

    2013-07-16

    Individuals with autism spectrum disorders (ASDs) often show insensitivity to the human voice, a deficit that is thought to play a key role in communication deficits in this population. The social motivation theory of ASD predicts that impaired function of reward and emotional systems impedes children with ASD from actively engaging with speech. Here we explore this theory by investigating distributed brain systems underlying human voice perception in children with ASD. Using resting-state functional MRI data acquired from 20 children with ASD and 19 age- and intelligence quotient-matched typically developing children, we examined intrinsic functional connectivity of voice-selective bilateral posterior superior temporal sulcus (pSTS). Children with ASD showed a striking pattern of underconnectivity between left-hemisphere pSTS and distributed nodes of the dopaminergic reward pathway, including bilateral ventral tegmental areas and nucleus accumbens, left-hemisphere insula, orbitofrontal cortex, and ventromedial prefrontal cortex. Children with ASD also showed underconnectivity between right-hemisphere pSTS, a region known for processing speech prosody, and the orbitofrontal cortex and amygdala, brain regions critical for emotion-related associative learning. The degree of underconnectivity between voice-selective cortex and reward pathways predicted symptom severity for communication deficits in children with ASD. Our results suggest that weak connectivity of voice-selective cortex and brain structures involved in reward and emotion may impair the ability of children with ASD to experience speech as a pleasurable stimulus, thereby impacting language and social skill development in this population. Our study provides support for the social motivation theory of ASD.

  15. Remote voice training: A case study on space shuttle applications, appendix C

    NASA Technical Reports Server (NTRS)

    Mollakarimi, Cindy; Hamid, Tamin

    1990-01-01

    The Tile Automation System includes applications of automation and robotics technology to all aspects of the Shuttle tile processing and inspection system. An integrated set of rapid prototyping testbeds was developed which include speech recognition and synthesis, laser imaging systems, distributed Ada programming environments, distributed relational data base architectures, distributed computer network architectures, multi-media workbenches, and human factors considerations. Remote voice training in the Tile Automation System is discussed. The user is prompted over a headset by synthesized speech for the training sequences. The voice recognition units and the voice output units are remote from the user and are connected by Ethernet to the main computer system. A supervisory channel is used to monitor the training sequences. Discussions include the training approaches as well as the human factors problems and solutions for this system utilizing remote training techniques.

  16. Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

    NASA Technical Reports Server (NTRS)

    Ye, Sherry

    2015-01-01

    NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.

  17. Maximal Ambient Noise Levels and Type of Voice Material Required for Valid Use of Smartphones in Clinical Voice Research.

    PubMed

    Lebacq, Jean; Schoentgen, Jean; Cantarella, Giovanna; Bruss, Franz Thomas; Manfredi, Claudia; DeJonckere, Philippe

    2017-09-01

    Smartphone technology provides new opportunities for recording standardized voice samples of patients and transmitting the audio files to the voice laboratory. This drastically improves the achievement of baseline designs, used in research on efficiency of voice treatments. However, the basic requirement is the suitability of smartphones for recording and digitizing pathologic voices (mainly characterized by period perturbations and noise) without significant distortion. In a previous article, this was tested using realistic synthesized deviant voice samples (/a:/) with three precisely known levels of jitter and of noise in all combinations. High correlations were found between jitter and noise to harmonics ratio measured in (1) recordings via smartphones, (2) direct microphone recordings, and (3) sound files generated by the synthesizer. In the present work, similar experiments were performed (1) in the presence of increasing levels of ambient noise and (2) using synthetic deviant voice samples (/a:/) as well as synthetic voice material simulating a deviant short voiced utterance (/aiuaiuaiu/). Ambient noise levels up to 50 dB A are acceptable. However, signal processing occurs in some smartphones, and this significantly affects estimates of jitter and noise to harmonics ratio when formant changes are introduced in analogy with running speech. The conclusion is that voice material must provisionally be limited to a sustained /a/. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  18. Transitioning from analog to digital audio recording in childhood speech sound disorders.

    PubMed

    Shriberg, Lawrence D; McSweeny, Jane L; Anderson, Bruce E; Campbell, Thomas F; Chial, Michael R; Green, Jordan R; Hauner, Katherina K; Moore, Christopher A; Rusiewicz, Heather L; Wilson, David L

    2005-06-01

    Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants' speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise.

  19. Transitioning from analog to digital audio recording in childhood speech sound disorders

    PubMed Central

    Shriberg, Lawrence D.; McSweeny, Jane L.; Anderson, Bruce E.; Campbell, Thomas F.; Chial, Michael R.; Green, Jordan R.; Hauner, Katherina K.; Moore, Christopher A.; Rusiewicz, Heather L.; Wilson, David L.

    2014-01-01

    Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants’ speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise. PMID:16019779

  20. Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech

    NASA Astrophysics Data System (ADS)

    Přibil, J.; Přibilová, A.

    2009-01-01

    The paper addresses reflection of microintonation and spectral properties in male and female acted emotional speech. Microintonation component of speech melody is analyzed regarding its spectral and statistical parameters. According to psychological research of emotional speech, different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of cepstral coefficient values and ranges of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Histograms of cepstral coefficient distribution are evaluated and compared using skewness and kurtosis. Achieved statistical results show good correlation comparing male and female voices for all emotional states portrayed by several Czech and Slovak professional actors.

  1. Exploring the feasibility of smart phone microphone for measurement of acoustic voice parameters and voice pathology screening.

    PubMed

    Uloza, Virgilijus; Padervinskis, Evaldas; Vegiene, Aurelija; Pribuisiene, Ruta; Saferis, Viktoras; Vaiciukynas, Evaldas; Gelzinis, Adas; Verikas, Antanas

    2015-11-01

    The objective of this study is to evaluate the reliability of acoustic voice parameters obtained using smart phone (SP) microphones and investigate the utility of use of SP voice recordings for voice screening. Voice samples of sustained vowel/a/obtained from 118 subjects (34 normal and 84 pathological voices) were recorded simultaneously through two microphones: oral AKG Perception 220 microphone and SP Samsung Galaxy Note3 microphone. Acoustic voice signal data were measured for fundamental frequency, jitter and shimmer, normalized noise energy (NNE), signal to noise ratio and harmonic to noise ratio using Dr. Speech software. Discriminant analysis-based Correct Classification Rate (CCR) and Random Forest Classifier (RFC) based Equal Error Rate (EER) were used to evaluate the feasibility of acoustic voice parameters classifying normal and pathological voice classes. Lithuanian version of Glottal Function Index (LT_GFI) questionnaire was utilized for self-assessment of the severity of voice disorder. The correlations of acoustic voice parameters obtained with two types of microphones were statistically significant and strong (r = 0.73-1.0) for the entire measurements. When classifying into normal/pathological voice classes, the Oral-NNE revealed the CCR of 73.7% and the pair of SP-NNE and SP-shimmer parameters revealed CCR of 79.5%. However, fusion of the results obtained from SP voice recordings and GFI data provided the CCR of 84.60% and RFC revealed the EER of 7.9%, respectively. In conclusion, measurements of acoustic voice parameters using SP microphone were shown to be reliable in clinical settings demonstrating high CCR and low EER when distinguishing normal and pathological voice classes, and validated the suitability of the SP microphone signal for the task of automatic voice analysis and screening.

  2. Combined Use of Standard and Throat Microphones for Measurement of Acoustic Voice Parameters and Voice Categorization.

    PubMed

    Uloza, Virgilijus; Padervinskis, Evaldas; Uloziene, Ingrida; Saferis, Viktoras; Verikas, Antanas

    2015-09-01

    The aim of the present study was to evaluate the reliability of the measurements of acoustic voice parameters obtained simultaneously using oral and contact (throat) microphones and to investigate utility of combined use of these microphones for voice categorization. Voice samples of sustained vowel /a/ obtained from 157 subjects (105 healthy and 52 pathological voices) were recorded in a soundproof booth simultaneously through two microphones: oral AKG Perception 220 microphone (AKG Acoustics, Vienna, Austria) and contact (throat) Triumph PC microphone (Clearer Communications, Inc, Burnaby, Canada) placed on the lamina of thyroid cartilage. Acoustic voice signal data were measured for fundamental frequency, percent of jitter and shimmer, normalized noise energy, signal-to-noise ratio, and harmonic-to-noise ratio using Dr. Speech software (Tiger Electronics, Seattle, WA). The correlations of acoustic voice parameters in vocal performance were statistically significant and strong (r = 0.71-1.0) for the entire functional measurements obtained for the two microphones. When classifying into healthy-pathological voice classes, the oral-shimmer revealed the correct classification rate (CCR) of 75.2% and the throat-jitter revealed CCR of 70.7%. However, combination of both throat and oral microphones allowed identifying a set of three voice parameters: throat-signal-to-noise ratio, oral-shimmer, and oral-normalized noise energy, which provided the CCR of 80.3%. The measurements of acoustic voice parameters using a combination of oral and throat microphones showed to be reliable in clinical settings and demonstrated high CCRs when distinguishing the healthy and pathological voice patient groups. Our study validates the suitability of the throat microphone signal for the task of automatic voice analysis for the purpose of voice screening. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  3. Effects of vocal training and phonatory task on voice onset time.

    PubMed

    McCrea, Christopher R; Morris, Richard J

    2007-01-01

    The purpose of this study was to examine the temporal-acoustic differences between trained singers and nonsingers during speech and singing tasks. Thirty male participants were separated into two groups of 15 according to level of vocal training (ie, trained or untrained). The participants spoke and sang carrier phrases containing English voiced and voiceless bilabial stops, and voice onset time (VOT) was measured for the stop consonant productions. Mixed analyses of variance revealed a significant main effect between speech and singing for /p/ and /b/, with VOT durations longer during speech than singing for /p/, and the opposite true for /b/. Furthermore, a significant phonatory task by vocal training interaction was observed for /p/ productions. The results indicated that the type of phonatory task influences VOT and that these influences are most obvious in trained singers secondary to the articulatory and phonatory adjustments learned during vocal training.

  4. The Hypothesis of Apraxia of Speech in Children with Autism Spectrum Disorder

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.

    2011-01-01

    In a sample of 46 children aged 4-7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants' speech, prosody, and voice were compared with data from 40 typically-developing children, 13…

  5. Common Speech Problems Encountered in General Practice

    PubMed Central

    Godfrey, Charles M.; Ward, Jean F.

    1962-01-01

    The authors consider speech and communication in the light of whole patient care and point out that defects may be signs and symptoms of underlying organic disease. They describe the four classifications of speech disorders—articulation, rhythm, voice and language, with an indication of the speech therapy required and duration of treatment. Special emphasis has been given to those speech problems which are seen by the family physician; these are usually of the articulation group. A short discussion of stuttering and aphasia is given. Emphasis is put on the direction of treatment by the physician and the use of well-qualified personnel as members of the rehabilitation team. PMID:13963265

  6. HEARING, LANGUAGE, AND SPEECH DISORDERS. NINDB RESEARCH PROFILE NUMBER 4.

    ERIC Educational Resources Information Center

    National Inst. of Neurological Diseases and Blindness (NIH), Bethesda, MD.

    AS PART OF HIS ANNUAL STATEMENT TO CONGRESS, THE DIRECTOR OF THE NATIONAL INSTITUTE OF NEUROLOGICAL DISEASES AND BLINDNESS DESCRIBES RESEARCH ACTIVITIES IN SPEECH AND HEARING DISORDERS. THIS REPORT SUMMARIZES INFORMATION CONCERNING THE PREVALENCE AND CAUSES OF COMMUNICATIVE DISORDERS (HEARING, SPEECH, LANGUAGE, VOICE, AND READING) IN CHILDREN AND…

  7. Measuring voice outcomes: state of the science review.

    PubMed

    Carding, Pau N; Wilson, J A; MacKenzie, K; Deary, I J

    2009-08-01

    Researchers evaluating voice disorder interventions currently have a plethora of voice outcome measurement tools from which to choose. Faced with such a wide choice, it would be beneficial to establish a clear rationale to guide selection. This article reviews the published literature on the three main areas of voice outcome assessment: (1) perceptual rating of voice quality, (2) acoustic measurement of the speech signal and (3) patient self-reporting of voice problems. We analysed the published reliability, validity, sensitivity to change and utility of the common outcome measurement tools in each area. From the data, we suggest that routine voice outcome measurement should include (1) an expert rating of voice quality (using the Grade-Roughness-Breathiness-Asthenia-Strain rating scale) and (2) a short self-reporting tool (either the Vocal Performance Questionnaire or the Vocal Handicap Index 10). These measures have high validity, the best reported reliability to date, good sensitivity to change data and excellent utility ratings. However, their application and administration require attention to detail. Acoustic measurement has arguable validity and poor reliability data at the present time. Other areas of voice outcome measurement (e.g. stroboscopy and aerodynamic phonatory measurements) require similarly detailed research and analysis.

  8. A new VOX technique for reducing noise in voice communication systems. [voice operated keying

    NASA Technical Reports Server (NTRS)

    Morris, C. F.; Morgan, W. C.; Shack, P. E.

    1974-01-01

    A VOX technique for reducing noise in voice communication systems is described which is based on the separation of voice signals into contiguous frequency-band components with the aid of an adaptive VOX in each band. It is shown that this processing scheme can effectively reduce both wideband and narrowband quasi-periodic noise since the threshold levels readjust themselves to suppress noise that exceeds speech components in each band. Results are reported for tests of the adaptive VOX, and it is noted that improvements can still be made in such areas as the elimination of noise pulses, phoneme reproduction at high-noise levels, and the elimination of distortion introduced by phase delay.

  9. Voice gender identification by cochlear implant users: The role of spectral and temporal resolution

    NASA Astrophysics Data System (ADS)

    Fu, Qian-Jie; Chinchilla, Sherol; Nogaki, Geraldine; Galvin, John J.

    2005-09-01

    The present study explored the relative contributions of spectral and temporal information to voice gender identification by cochlear implant users and normal-hearing subjects. Cochlear implant listeners were tested using their everyday speech processors, while normal-hearing subjects were tested under speech processing conditions that simulated various degrees of spectral resolution, temporal resolution, and spectral mismatch. Voice gender identification was tested for two talker sets. In Talker Set 1, the mean fundamental frequency values of the male and female talkers differed by 100 Hz while in Talker Set 2, the mean values differed by 10 Hz. Cochlear implant listeners achieved higher levels of performance with Talker Set 1, while performance was significantly reduced for Talker Set 2. For normal-hearing listeners, performance was significantly affected by the spectral resolution, for both Talker Sets. With matched speech, temporal cues contributed to voice gender identification only for Talker Set 1 while spectral mismatch significantly reduced performance for both Talker Sets. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to 4-8 spectral channels. The results suggest that, because of the reduced spectral resolution, cochlear implant patients may attend strongly to periodicity cues to distinguish voice gender.

  10. The relative impact of generic head-related transfer functions on auditory speech thresholds: implications for the design of three-dimensional audio displays.

    PubMed

    Arrabito, G R; McFadden, S M; Crabtree, R B

    2001-07-01

    Auditory speech thresholds were measured in this study. Subjects were required to discriminate a female voice recording of three-digit numbers in the presence of diotic speech babble. The voice stimulus was spatialized at 11 static azimuth positions on the horizontal plane using three different head-related transfer functions (HRTFs) measured on individuals who did not participate in this study. The diotic presentation of the voice stimulus served as the control condition. The results showed that two of the HRTFS performed similarly and had significantly lower auditory speech thresholds than the third HRTF. All three HRTFs yielded significantly lower auditory speech thresholds compared with the diotic presentation of the voice stimulus, with the largest difference at 60 degrees azimuth. The practical implications of these results suggest that lower headphone levels of the communication system in military aircraft can be achieved without sacrificing intelligibility, thereby lessening the risk of hearing loss.

  11. Status Report on Speech Research. A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications.

    DTIC Science & Technology

    1985-10-01

    speech errors. References Anderson, V. A. (1942). Training the speaking voice. New York: Oxford University Press. 50...is only about speech perception , in contrast to some t.at deal with other perceptual processes (e.g., Berkeley, 1709; Fest- inger, Burnham, Ono...there a process of learned equivalence. An example is the claim that the 66 * ° - . . Liberman & Mattingly: The Motor Theory of Speech Perception Revised

  12. Perception and analysis of Spanish accents in English speech

    NASA Astrophysics Data System (ADS)

    Chism, Cori; Lass, Norman

    2002-05-01

    The purpose of the present study was to determine what relates most closely to the degree of perceived foreign accent in the English speech of native Spanish speakers: intonation, vowel length, stress, voice onset time (VOT), or segmental accuracy. Nineteen native English speaking listeners rated speech samples from 7 native English speakers and 15 native Spanish speakers for comprehensibility and degree of foreign accent. The speech samples were analyzed spectrographically and perceptually to obtain numerical values for each variable. Correlation coefficients were computed to determine the relationship beween these values and the average foreign accent scores. Results showed that the average foreign accent scores were statistically significantly correlated with three variables: the length of stressed vowels (r=-0.48, p=0.05), voice onset time (r =-0.62, p=0.01), and segmental accuracy (r=0.92, p=0.001). Implications of these findings and suggestions for future research are discussed.

  13. Sex Differences in the Older Voice.

    ERIC Educational Resources Information Center

    Benjamin, Barbaranne J.

    A study investigated differences between older adult male and female voice patterns. In addition, the study examined whether certain differences between male and female speech characteristics were lifelong and not associated with the aging process. Subjects were 10 young (average age 30) and 10 old (average age 75) males and 10 young (average age…

  14. Alerting prefixes for speech warning messages. [in helicopters

    NASA Technical Reports Server (NTRS)

    Bucher, N. M.; Voorhees, J. W.; Karl, R. L.; Werner, E.

    1984-01-01

    A major question posed by the design of an integrated voice information display/warning system for next-generation helicopter cockpits is whether an alerting prefix should precede voice warning messages; if so, the characteristics desirable in such a cue must also be addressed. Attention is presently given to the results of a study which ascertained pilot response time and response accuracy to messages preceded by either neutral cues or the cognitively appropriate semantic cues. Both verbal cues and messages were spoken in direct, phoneme-synthesized speech, and a training manipulation was included to determine the extent to which previous exposure to speech thus produced facilitates these messages' comprehension. Results are discussed in terms of the importance of human factors research in cockpit display design.

  15. Effects of Voice Coding and Speech Rate on a Synthetic Speech Display in a Telephone Information System

    DTIC Science & Technology

    1988-05-01

    Seeciv Limited- System for varying Senses term filter capacity output until some Figure 2. Original limited-capacity channel model (Frim Broadbent, 1958) S...2 Figure 2. Original limited-capacity channel model (From Broadbent, 1958) .... 10 Figure 3. Experimental...unlimited variety of human voices for digital recording sources. Synthesis by Analysis Analysis-synthesis methods electronically model the human voice

  16. Voice control of the space shuttle video system

    NASA Technical Reports Server (NTRS)

    Bejczy, A. K.; Dotson, R. S.; Brown, J. W.; Lewis, J. L.

    1981-01-01

    A pilot voice control system developed at the Jet Propulsion Laboratory (JPL) to test and evaluate the feasibility of controlling the shuttle TV cameras and monitors by voice commands utilizes a commercially available discrete word speech recognizer which can be trained to the individual utterances of each operator. Successful ground tests were conducted using a simulated full-scale space shuttle manipulator. The test configuration involved the berthing, maneuvering and deploying a simulated science payload in the shuttle bay. The handling task typically required 15 to 20 minutes and 60 to 80 commands to 4 TV cameras and 2 TV monitors. The best test runs show 96 to 100 percent voice recognition accuracy.

  17. Effects of Voice Rehabilitation After Radiation Therapy for Laryngeal Cancer: A Randomized Controlled Study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tuomi, Lisa, E-mail: lisa.tuomi@vgregion.se; Andréll, Paulin; Finizia, Caterina

    Background: Patients treated with radiation therapy for laryngeal cancer often experience voice problems. The aim of this randomized controlled trial was to assess the efficacy of voice rehabilitation for laryngeal cancer patients after having undergone radiation therapy and to investigate whether differences between different tumor localizations with regard to rehabilitation outcomes exist. Methods and Materials: Sixty-nine male patients irradiated for laryngeal cancer participated. Voice recordings and self-assessments of communicative dysfunction were performed 1 and 6 months after radiation therapy. Thirty-three patients were randomized to structured voice rehabilitation with a speech-language pathologist and 36 to a control group. Furthermore, comparisons withmore » 23 healthy control individuals were made. Acoustic analyses were performed for all patients, including the healthy control individuals. The Swedish version of the Self Evaluation of Communication Experiences after Laryngeal Cancer and self-ratings of voice function were used to assess vocal and communicative function. Results: The patients who received vocal rehabilitation experienced improved self-rated vocal function after rehabilitation. Patients with supraglottic tumors who received voice rehabilitation had statistically significant improvements in voice quality and self-rated vocal function, whereas the control group did not. Conclusion: Voice rehabilitation for male patients with laryngeal cancer is efficacious regarding patient-reported outcome measurements. The patients experienced better voice function after rehabilitation. Patients with supraglottic tumors also showed an improvement in terms of acoustic voice outcomes. Rehabilitation with a speech-language pathologist is recommended for laryngeal cancer patients after radiation therapy, particularly for patients with supraglottic tumors.« less

  18. Listeners' Attitudes toward Children with Voice Problems

    ERIC Educational Resources Information Center

    Ma, Estella P.-M.; Yu, Camille H.-Y.

    2013-01-01

    Purpose: To investigate the attitudes of school teachers toward children with voice problems in a Chinese population. Method: Three groups of listeners participated in this study: primary school teachers, speech-language pathology students, and general university students. The participants were required to make attitude judgments on 12 voice…

  19. Acoustical conditions for speech communication in active elementary school classrooms

    NASA Astrophysics Data System (ADS)

    Sato, Hiroshi; Bradley, John

    2005-04-01

    Detailed acoustical measurements were made in 34 active elementary school classrooms with typical rectangular room shape in schools near Ottawa, Canada. There was an average of 21 students in classrooms. The measurements were made to obtain accurate indications of the acoustical quality of conditions for speech communication during actual teaching activities. Mean speech and noise levels were determined from the distribution of recorded sound levels and the average speech-to-noise ratio was 11 dBA. Measured mid-frequency reverberation times (RT) during the same occupied conditions varied from 0.3 to 0.6 s, and were a little less than for the unoccupied rooms. RT values were not related to noise levels. Octave band speech and noise levels, useful-to-detrimental ratios, and Speech Transmission Index values were also determined. Key results included: (1) The average vocal effort of teachers corresponded to louder than Pearsons Raised voice level; (2) teachers increase their voice level to overcome ambient noise; (3) effective speech levels can be enhanced by up to 5 dB by early reflection energy; and (4) student activity is seen to be the dominant noise source, increasing average noise levels by up to 10 dBA during teaching activities. [Work supported by CLLRnet.

  20. An Investigation of Multidimensional Voice Program Parameters in Three Different Databases for Voice Pathology Detection and Classification.

    PubMed

    Al-Nasheri, Ahmed; Muhammad, Ghulam; Alsulaiman, Mansour; Ali, Zulfiqar; Mesallam, Tamer A; Farahat, Mohamed; Malki, Khalid H; Bencherif, Mohamed A

    2017-01-01

    Automatic voice-pathology detection and classification systems may help clinicians to detect the existence of any voice pathologies and the type of pathology from which patients suffer in the early stages. The main aim of this paper is to investigate Multidimensional Voice Program (MDVP) parameters to automatically detect and classify the voice pathologies in multiple databases, and then to find out which parameters performed well in these two processes. Samples of the sustained vowel /a/ of normal and pathological voices were extracted from three different databases, which have three voice pathologies in common. The selected databases in this study represent three distinct languages: (1) the Arabic voice pathology database; (2) the Massachusetts Eye and Ear Infirmary database (English database); and (3) the Saarbruecken Voice Database (German database). A computerized speech lab program was used to extract MDVP parameters as features, and an acoustical analysis was performed. The Fisher discrimination ratio was applied to rank the parameters. A t test was performed to highlight any significant differences in the means of the normal and pathological samples. The experimental results demonstrate a clear difference in the performance of the MDVP parameters using these databases. The highly ranked parameters also differed from one database to another. The best accuracies were obtained by using the three highest ranked MDVP parameters arranged according to the Fisher discrimination ratio: these accuracies were 99.68%, 88.21%, and 72.53% for the Saarbruecken Voice Database, the Massachusetts Eye and Ear Infirmary database, and the Arabic voice pathology database, respectively. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  1. Robotics control using isolated word recognition of voice input

    NASA Technical Reports Server (NTRS)

    Weiner, J. M.

    1977-01-01

    A speech input/output system is presented that can be used to communicate with a task oriented system. Human speech commands and synthesized voice output extend conventional information exchange capabilities between man and machine by utilizing audio input and output channels. The speech input facility is comprised of a hardware feature extractor and a microprocessor implemented isolated word or phrase recognition system. The recognizer offers a medium sized (100 commands), syntactically constrained vocabulary, and exhibits close to real time performance. The major portion of the recognition processing required is accomplished through software, minimizing the complexity of the hardware feature extractor.

  2. Movement of the velum during speech and singing in classically trained singers.

    PubMed

    Austin, S F

    1997-06-01

    The present study addresses two questions: (a) Is the action and/or posture of the velopharyngeal valve conducive to allow significant resonance during Western tradition classical singing? (b) How do the actions of the velopharyngeal valve observed in this style of singing compare with normal speech? A photodetector system was used to observe the area function of the velopharyngeal port during speech and classical style singing. Identical speech samples were produced by each subject in a normal speaking voice and then in the low, medium, and high singing ranges. Results indicate that in these four singers the velopharyngeal port was closed significantly longer in singing than in speaking samples. The amount of time the velopharyngeal port was opened was greatest in speech and diminished as the singer ascended in pitch. In the high voice condition, little or no opening of the velopharyngeal port was measured.

  3. Delivering the Lee Silverman Voice Treatment (LSVT) by Web Camera: A Feasibility Study

    ERIC Educational Resources Information Center

    Howell, Susan; Tripoliti, Elina; Pring, Tim

    2009-01-01

    Background: Speech disorders are a feature of Parkinson's disease, typically worsening as the disease progresses. The Lee Silverman Voice Treatment (LSVT) was developed to address these difficulties. It targets vocal loudness as a means of increasing vocal effort and improving coordination across the subsystems of speech. Aims: Currently LSVT is…

  4. Long-term average spectrum in screening of voice quality in speech: untrained male university students.

    PubMed

    Leino, Timo

    2009-11-01

    Voice quality has mainly been studied in trained speakers, singers, and dysphonic patients. Few studies have concerned ordinary untrained university students' voices. In light of earlier studies of professional voice users, it was hypothesized that good, poor, and intermediate voices would be distinguishable on the basis of long-term average spectrum characteristics. In the present study, voice quality of 50 Finnish vocally untrained male university students was studied perceptually and using long-term average spectrum analysis of text reading samples of one minute duration. Equivalent sound level (Leq) of text reading was also measured. According to the results, the good and ordinary voices differed from the poor ones in their relatively higher sound level in the frequency range of 1-3 kHz and a prominent peak at 3-4 kHz. Good voices, however, did not differ from the ordinary voices in terms of the characteristics of the long-term average spectrum (LTAS). The strength of the peak at 3-4 kHz and the voice-quality scores correlated weakly but significantly. Voice quality and alpha ratio (level difference above and below 1 kHz) correlated likewise. Leq was significantly higher in the students with good and ordinary voices than in those with poor voices. The connections between Leq, voice quality, and the formation of the peak at 3-4 kHz warrant further studies.

  5. [Swallowing and Voice Disorders in Cancer Patients].

    PubMed

    Tanuma, Akira

    2015-07-01

    Dysphagia sometimes occurs in patients with head and neck cancer, particularly in those undergoing surgery and radiotherapy for lingual, pharyngeal, and laryngeal cancer. It also occurs in patients with esophageal cancer and brain tumor. Patients who undergo glossectomy usually show impairment of the oral phase of swallowing, whereas those with pharyngeal, laryngeal, and esophageal cancer show impairment of the pharyngeal phase of swallowing. Videofluoroscopic examination of swallowing provides important information necessary for rehabilitation of swallowing in these patients. Appropriate swallowing exercises and compensatory strategies can be decided based on the findings of the evaluation. Palatal augmentation prostheses are sometimes used for rehabilitation in patients undergoing glossectomy. Patients who undergo total laryngectomy or total pharyngolaryngoesophagectomy should receive speech therapy to enable them to use alaryngeal speech methods, including electrolarynx, esophageal speech, or speech via tracheoesophageal puncture. Regaining swallowing function and speech can improve a patient's emotional health and quality of life. Therefore, it is important to manage swallowing and voice disorders appropriately.

  6. The effect of deep brain stimulation on the speech motor system.

    PubMed

    Mücke, Doris; Becker, Johannes; Barbe, Michael T; Meister, Ingo; Liebhart, Lena; Roettger, Timo B; Dembek, Till; Timmermann, Lars; Grice, Martine

    2014-08-01

    Chronic deep brain stimulation of the nucleus ventralis intermedius is an effective treatment for individuals with medication-resistant essential tremor. However, these individuals report that stimulation has a deleterious effect on their speech. The present study investigates one important factor leading to these effects: the coordination of oral and glottal articulation. Sixteen native-speaking German adults with essential tremor, between 26 and 86 years old, with and without chronic deep brain stimulation of the nucleus ventralis intermedius and 12 healthy, age-matched subjects were recorded performing a fast syllable repetition task (/papapa/, /tatata/, /kakaka/). Syllable duration and voicing-to-syllable ratio as well as parameters related directly to consonant production, voicing during constriction, and frication during constriction were measured. Voicing during constriction was greater in subjects with essential tremor than in controls, indicating a perseveration of voicing into the voiceless consonant. Stimulation led to fewer voiceless intervals (voicing-to-syllable ratio), indicating a reduced degree of glottal abduction during the entire syllable cycle. Stimulation also induced incomplete oral closures (frication during constriction), indicating imprecise oral articulation. The detrimental effect of stimulation on the speech motor system can be quantified using acoustic measures at the subsyllabic level.

  7. Color and texture associations in voice-induced synesthesia

    PubMed Central

    Moos, Anja; Simmons, David; Simner, Julia; Smith, Rachel

    2013-01-01

    Voice-induced synesthesia, a form of synesthesia in which synesthetic perceptions are induced by the sounds of people's voices, appears to be relatively rare and has not been systematically studied. In this study we investigated the synesthetic color and visual texture perceptions experienced in response to different types of “voice quality” (e.g., nasal, whisper, falsetto). Experiences of three different groups—self-reported voice synesthetes, phoneticians, and controls—were compared using both qualitative and quantitative analysis in a study conducted online. Whilst, in the qualitative analysis, synesthetes used more color and texture terms to describe voices than either phoneticians or controls, only weak differences, and many similarities, between groups were found in the quantitative analysis. Notable consistent results between groups were the matching of higher speech fundamental frequencies with lighter and redder colors, the matching of “whispery” voices with smoke-like textures, and the matching of “harsh” and “creaky” voices with textures resembling dry cracked soil. These data are discussed in the light of current thinking about definitions and categorizations of synesthesia, especially in cases where individuals apparently have a range of different synesthetic inducers. PMID:24032023

  8. Status Report on Speech Research, No. 29/30, January-June 1972.

    ERIC Educational Resources Information Center

    Haskins Labs., New Haven, CT.

    This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts and extended reports cover the following topics: iconic storage, voice-timing perception, oral anesthesia, laryngeal function, electromyography of speech production,…

  9. Training the Speaking Voice through Singing.

    ERIC Educational Resources Information Center

    Sipley, Kenneth L.

    Speech teachers and singing teachers have much in common. Both attempt in their teaching to develop the most powerful and effective instrument possible while trying to avoid vocal problems. Both have studied the physiology of the vocal mechanism to assist them in their teaching. Both are concerned with the expressive qualities of the voice as well…

  10. ERP correlates of motivating voices: quality of motivation and time-course matters

    PubMed Central

    Zougkou, Konstantina; Weinstein, Netta

    2017-01-01

    Abstract Here, we conducted the first study to explore how motivations expressed through speech are processed in real-time. Participants listened to sentences spoken in two types of well-studied motivational tones (autonomy-supportive and controlling), or a neutral tone of voice. To examine this, listeners were presented with sentences that either signaled motivations through prosody (tone of voice) and words simultaneously (e.g. ‘You absolutely have to do it my way’ spoken in a controlling tone of voice), or lacked motivationally biasing words (e.g. ‘Why don’t we meet again tomorrow’ spoken in a motivational tone of voice). Event-related brain potentials (ERPs) in response to motivations conveyed through words and prosody showed that listeners rapidly distinguished between motivations and neutral forms of communication as shown in enhanced P2 amplitudes in response to motivational when compared with neutral speech. This early detection mechanism is argued to help determine the importance of incoming information. Once assessed, motivational language is continuously monitored and thoroughly evaluated. When compared with neutral speech, listening to controlling (but not autonomy-supportive) speech led to enhanced late potential ERP mean amplitudes, suggesting that listeners are particularly attuned to controlling messages. The importance of controlling motivation for listeners is mirrored in effects observed for motivations expressed through prosody only. Here, an early rapid appraisal, as reflected in enhanced P2 amplitudes, is only found for sentences spoken in controlling (but not autonomy-supportive) prosody. Once identified as sounding pressuring, the message seems to be preferentially processed, as shown by enhanced late potential amplitudes in response to controlling prosody. Taken together, results suggest that motivational and neutral language are differentially processed; further, the data suggest that listening to cues signaling pressure and

  11. Voice emotion recognition by cochlear-implanted children and their normally-hearing peers

    PubMed Central

    Chatterjee, Monita; Zion, Danielle; Deroche, Mickael L.; Burianek, Brooke; Limb, Charles; Goren, Alison; Kulkarni, Aditya M.; Christensen, Julie A.

    2014-01-01

    Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups’ mean performance is similar to aNHs’ performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. PMID:25448167

  12. Voice emotion recognition by cochlear-implanted children and their normally-hearing peers.

    PubMed

    Chatterjee, Monita; Zion, Danielle J; Deroche, Mickael L; Burianek, Brooke A; Limb, Charles J; Goren, Alison P; Kulkarni, Aditya M; Christensen, Julie A

    2015-04-01

    Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups' mean performance is similar to aNHs' performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. This article is part of a Special Issue entitled . Copyright © 2014 Elsevier B.V. All rights reserved.

  13. ERP correlates of motivating voices: quality of motivation and time-course matters.

    PubMed

    Zougkou, Konstantina; Weinstein, Netta; Paulmann, Silke

    2017-10-01

    Here, we conducted the first study to explore how motivations expressed through speech are processed in real-time. Participants listened to sentences spoken in two types of well-studied motivational tones (autonomy-supportive and controlling), or a neutral tone of voice. To examine this, listeners were presented with sentences that either signaled motivations through prosody (tone of voice) and words simultaneously (e.g. 'You absolutely have to do it my way' spoken in a controlling tone of voice), or lacked motivationally biasing words (e.g. 'Why don't we meet again tomorrow' spoken in a motivational tone of voice). Event-related brain potentials (ERPs) in response to motivations conveyed through words and prosody showed that listeners rapidly distinguished between motivations and neutral forms of communication as shown in enhanced P2 amplitudes in response to motivational when compared with neutral speech. This early detection mechanism is argued to help determine the importance of incoming information. Once assessed, motivational language is continuously monitored and thoroughly evaluated. When compared with neutral speech, listening to controlling (but not autonomy-supportive) speech led to enhanced late potential ERP mean amplitudes, suggesting that listeners are particularly attuned to controlling messages. The importance of controlling motivation for listeners is mirrored in effects observed for motivations expressed through prosody only. Here, an early rapid appraisal, as reflected in enhanced P2 amplitudes, is only found for sentences spoken in controlling (but not autonomy-supportive) prosody. Once identified as sounding pressuring, the message seems to be preferentially processed, as shown by enhanced late potential amplitudes in response to controlling prosody. Taken together, results suggest that motivational and neutral language are differentially processed; further, the data suggest that listening to cues signaling pressure and control cannot be

  14. Decoding Articulatory Features from fMRI Responses in Dorsal Speech Regions.

    PubMed

    Correia, Joao M; Jansma, Bernadette M B; Bonte, Milene

    2015-11-11

    The brain's circuitry for perceiving and producing speech may show a notable level of overlap that is crucial for normal development and behavior. The extent to which sensorimotor integration plays a role in speech perception remains highly controversial, however. Methodological constraints related to experimental designs and analysis methods have so far prevented the disentanglement of neural responses to acoustic versus articulatory speech features. Using a passive listening paradigm and multivariate decoding of single-trial fMRI responses to spoken syllables, we investigated brain-based generalization of articulatory features (place and manner of articulation, and voicing) beyond their acoustic (surface) form in adult human listeners. For example, we trained a classifier to discriminate place of articulation within stop syllables (e.g., /pa/ vs /ta/) and tested whether this training generalizes to fricatives (e.g., /fa/ vs /sa/). This novel approach revealed generalization of place and manner of articulation at multiple cortical levels within the dorsal auditory pathway, including auditory, sensorimotor, motor, and somatosensory regions, suggesting the representation of sensorimotor information. Additionally, generalization of voicing included the right anterior superior temporal sulcus associated with the perception of human voices as well as somatosensory regions bilaterally. Our findings highlight the close connection between brain systems for speech perception and production, and in particular, indicate the availability of articulatory codes during passive speech perception. Sensorimotor integration is central to verbal communication and provides a link between auditory signals of speech perception and motor programs of speech production. It remains highly controversial, however, to what extent the brain's speech perception system actively uses articulatory (motor), in addition to acoustic/phonetic, representations. In this study, we examine the role of

  15. Dimension-based statistical learning affects both speech perception and production

    PubMed Central

    Lehet, Matthew; Holt, Lori L.

    2016-01-01

    Multiple acoustic dimensions signal speech categories. However, dimensions vary in their informativeness; some are more diagnostic of category membership than others. Speech categorization reflects these dimensional regularities such that diagnostic dimensions carry more “perceptual weight” and more effectively signal category membership to native listeners. Yet, perceptual weights are malleable. When short-term experience deviates from long-term language norms, such as in a foreign accent, the perceptual weight of acoustic dimensions in signaling speech category membership rapidly adjusts. The present study investigated whether rapid adjustments in listeners’ perceptual weights in response to speech that deviates from the norms also affects listeners’ own speech productions. In a word recognition task, the correlation between two acoustic dimensions signaling consonant categories, fundamental frequency (F0) and voice onset time (VOT), matched the correlation typical of English, then shifted to an “artificial accent” that reversed the relationship, and then shifted back. Brief, incidental exposure to the artificial accent caused participants to down-weight perceptual reliance on F0, consistent with previous research. Throughout the task, participants were intermittently prompted with pictures to produce these same words. In the block in which listeners heard the artificial accent with a reversed F0 x VOT correlation, F0 was a less robust cue to voicing in listeners’ own speech productions. The statistical regularities of short-term speech input affect both speech perception and production, as evidenced via shifts in how acoustic dimensions are weighted. PMID:27666146

  16. Influence of Left-Right Asymmetries on Voice Quality in Simulated Paramedian Vocal Fold Paralysis

    ERIC Educational Resources Information Center

    Samlan, Robin A.; Story, Brad H.

    2017-01-01

    Purpose: The purpose of this study was to determine the vocal fold structural and vibratory symmetries that are important to vocal function and voice quality in a simulated paramedian vocal fold paralysis. Method: A computational kinematic speech production model was used to simulate an exemplar "voice" on the basis of asymmetric…

  17. Speech technology and cinema: can they learn from each other?

    PubMed

    Pauletto, Sandra

    2013-10-01

    The voice is the most important sound of a film soundtrack. It represents a character and it carries language. There are different types of cinematic voices: dialogue, internal monologues, and voice-overs. Conventionally, two main characteristics differentiate these voices: lip synchronization and the voice's attributes that make it appropriate for the character (for example, a voice that sounds very close to the audience can be appropriate for a narrator, but not for an onscreen character). What happens, then, if a film character can only speak through an asynchronous machine that produces a 'robot-like' voice? This article discusses the sound-related work and experimentation done by the author for the short film Voice by Choice. It also attempts to discover whether speech technology design can learn from its cinematic representation, and if such uncommon film protagonists can contribute creatively to transform the conventions of cinematic voices.

  18. Is There an Ironic Tone of Voice?

    ERIC Educational Resources Information Center

    Bryant, Gregory A.; Fox Tree, Jean E.

    2005-01-01

    Research on nonverbal vocal cues and verbal irony has often relied on the concept of an "ironic tone of voice". Here we provide acoustic analysis and experimental evidence that this notion is oversimplified and misguided. Acoustic analyses of spontaneous ironic speech extracted from talk radio shows, both ambiguous and unambiguous in…

  19. Taste quality decoding parallels taste sensations.

    PubMed

    Crouzet, Sébastien M; Busch, Niko A; Ohla, Kathrin

    2015-03-30

    In most species, the sense of taste is key in the distinction of potentially nutritious and harmful food constituents and thereby in the acceptance (or rejection) of food. Taste quality is encoded by specialized receptors on the tongue, which detect chemicals corresponding to each of the basic tastes (sweet, salty, sour, bitter, and savory [1]), before taste quality information is transmitted via segregated neuronal fibers [2], distributed coding across neuronal fibers [3], or dynamic firing patterns [4] to the gustatory cortex in the insula. In rodents, both hardwired coding by labeled lines [2] and flexible, learning-dependent representations [5] and broadly tuned neurons [6] seem to coexist. It is currently unknown how, when, and where taste quality representations are established in the cortex and whether these representations are used for perceptual decisions. Here, we show that neuronal response patterns allow to decode which of four tastants (salty, sweet, sour, and bitter) participants tasted in a given trial by using time-resolved multivariate pattern analyses of large-scale electrophysiological brain responses. The onset of this prediction coincided with the earliest taste-evoked responses originating from the insula and opercular cortices, indicating that quality is among the first attributes of a taste represented in the central gustatory system. These response patterns correlated with perceptual decisions of taste quality: tastes that participants discriminated less accurately also evoked less discriminated brain response patterns. The results therefore provide the first evidence for a link between taste-related decision-making and the predictive value of these brain response patterns. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. Current trends in small vocabulary speech recognition for equipment control

    NASA Astrophysics Data System (ADS)

    Doukas, Nikolaos; Bardis, Nikolaos G.

    2017-09-01

    Speech recognition systems allow human - machine communication to acquire an intuitive nature that approaches the simplicity of inter - human communication. Small vocabulary speech recognition is a subset of the overall speech recognition problem, where only a small number of words need to be recognized. Speaker independent small vocabulary recognition can find significant applications in field equipment used by military personnel. Such equipment may typically be controlled by a small number of commands that need to be given quickly and accurately, under conditions where delicate manual operations are difficult to achieve. This type of application could hence significantly benefit by the use of robust voice operated control components, as they would facilitate the interaction with their users and render it much more reliable in times of crisis. This paper presents current challenges involved in attaining efficient and robust small vocabulary speech recognition. These challenges concern feature selection, classification techniques, speaker diversity and noise effects. A state machine approach is presented that facilitates the voice guidance of different equipment in a variety of situations.

  1. Voice Interactive Analysis System Study. Final Report, August 28, 1978 through March 23, 1979.

    ERIC Educational Resources Information Center

    Harry, D. P.; And Others

    The Voice Interactive Analysis System study continued research and development of the LISTEN real-time, minicomputer based connected speech recognition system, within NAVTRAEQUIPCEN'S program of developing automatic speech technology in support of training. An attempt was made to identify the most effective features detected by the TTI-500 model…

  2. The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users.

    PubMed

    Fu, Qian-Jie; Chinchilla, Sherol; Galvin, John J

    2004-09-01

    The present study investigated the relative importance of temporal and spectral cues in voice gender discrimination and vowel recognition by normal-hearing subjects listening to an acoustic simulation of cochlear implant speech processing and by cochlear implant users. In the simulation, the number of speech processing channels ranged from 4 to 32, thereby varying the spectral resolution; the cutoff frequencies of the channels' envelope filters ranged from 20 to 320 Hz, thereby manipulating the available temporal cues. For normal-hearing subjects, results showed that both voice gender discrimination and vowel recognition scores improved as the number of spectral channels was increased. When only 4 spectral channels were available, voice gender discrimination significantly improved as the envelope filter cutoff frequency was increased from 20 to 320 Hz. For all spectral conditions, increasing the amount of temporal information had no significant effect on vowel recognition. Both voice gender discrimination and vowel recognition scores were highly variable among implant users. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to comparable speech processing (4-8 spectral channels). The results suggest that both spectral and temporal cues contribute to voice gender discrimination and that temporal cues are especially important for cochlear implant users to identify the voice gender when there is reduced spectral resolution.

  3. Kernel-Based Sensor Fusion With Application to Audio-Visual Voice Activity Detection

    NASA Astrophysics Data System (ADS)

    Dov, David; Talmon, Ronen; Cohen, Israel

    2016-12-01

    In this paper, we address the problem of multiple view data fusion in the presence of noise and interferences. Recent studies have approached this problem using kernel methods, by relying particularly on a product of kernels constructed separately for each view. From a graph theory point of view, we analyze this fusion approach in a discrete setting. More specifically, based on a statistical model for the connectivity between data points, we propose an algorithm for the selection of the kernel bandwidth, a parameter, which, as we show, has important implications on the robustness of this fusion approach to interferences. Then, we consider the fusion of audio-visual speech signals measured by a single microphone and by a video camera pointed to the face of the speaker. Specifically, we address the task of voice activity detection, i.e., the detection of speech and non-speech segments, in the presence of structured interferences such as keyboard taps and office noise. We propose an algorithm for voice activity detection based on the audio-visual signal. Simulation results show that the proposed algorithm outperforms competing fusion and voice activity detection approaches. In addition, we demonstrate that a proper selection of the kernel bandwidth indeed leads to improved performance.

  4. Medications and Adverse Voice Effects.

    PubMed

    Nemr, Kátia; Di Carlos Silva, Ariana; Rodrigues, Danilo de Albuquerque; Zenari, Marcia Simões

    2017-08-16

    To identify the medications used by patients with dysphonia, describe the voice symptoms reported on initial speech-language pathology (SLP) examination, evaluate the possible direct and indirect effects of medications on voice production, and determine the association between direct and indirect adverse voice effects and self-reported voice symptoms, hydration and smoking habits, comorbidities, vocal assessment, and type and degree of dysphonia. This is a retrospective cross-sectional study. Fifty-five patients were evaluated and the vocal signs and symptoms indicated in the Dysphonia Risk Protocol were considered, as well as data on hydration, smoking and medication use. We analyzed the associations between type of side effect and self-reported vocal signs/symptoms, hydration, smoking, comorbidities, type of dysphonia, and auditory-perceptual and acoustic parameters. Sixty percent were women, the mean age was 51.8 years, 29 symptoms were reported on the screening, and 73 active ingredients were identified with 8.2% directly and 91.8% indirectly affecting vocal function. There were associations between the use of drugs with direct adverse voice effects, self-reported symptoms, general degree of vocal deviation, and pitch deviation. The symptoms of dry throat and shortness of breath were associated with the direct vocal side effect of the medicine, as well as the general degree of vocal deviation and the greater pitch deviation. Shortness of breath when speaking was also associated with the greatest degree of vocal deviation. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  5. Tasting

    MedlinePlus Videos and Cool Tools

    ... about 10,000 taste buds. The taste buds are linked to the brain by nerve fibers. Food particles are detected by the taste buds, which send nerve ... to the brain. Certain areas of the tongue are more sensitive to certain tastes, like bitter, sour, ...

  6. Politeness, emotion, and gender: A sociophonetic study of voice pitch modulation

    NASA Astrophysics Data System (ADS)

    Yuasa, Ikuko

    The present dissertation is a cross-gender and cross-cultural sociophonetic exploration of voice pitch characteristics utilizing speech data derived from Japanese and American speakers in natural conversations. The roles of voice pitch modulation in terms of the concepts of politeness and emotion as they pertain to culture and gender will be investigated herein. The research interprets the significance of my findings based on the acoustic measurements of speech data as they are presented in the ERB-rate scale (the most appropriate scale for human speech perception). The investigation reveals that pitch range modulation displayed by Japanese informants in two types of conversations is closely linked to types of politeness adopted by those informants. The degree of the informants' emotional involvement and expressions reflected in differing pitch range widths plays an important role in determining the relationship between pitch range modulation and politeness. The study further correlates the Japanese cultural concept of enryo ("self-restraint") with this phenomenon. When median values were examined, male and female pitch ranges across cultures did not conspicuously differ. However, sporadically occurring women's pitch characteristics which culturally differ in width and height of pitch ranges may create an 'emotional' perception of women's speech style. The salience of these pitch characteristics appears to be the source of the stereotypically linked sound of women's speech being identified as 'swoopy' or 'shrill' and thus 'emotional'. Such women's salient voice characteristics are interpreted in light of camaraderie/positive politeness. Women's use of conspicuous paralinguistic features helps to create an atmosphere of camaraderie. These voice pitch characteristics promote the establishment of a sense of camaraderie since they act to emphasize such feelings as concern, support, and comfort towards addressees, Moreover, men's wide pitch ranges are discussed in view

  7. Children's Recognition of Their Own Recorded Voice: Influence of Age and Phonological Impairment

    ERIC Educational Resources Information Center

    Strombergsson, Sofia

    2013-01-01

    Children with phonological impairment (PI) often have difficulties perceiving insufficiencies in their own speech. The use of recordings has been suggested as a way of directing the child's attention toward his/her own speech, despite a lack of evidence that children actually recognize their recorded voice as their own. We present two studies of…

  8. Co-Variation of Tonality in the Music and Speech of Different Cultures

    PubMed Central

    Han, Shui' er; Sundararajan, Janani; Bowling, Daniel Liu; Lake, Jessica; Purves, Dale

    2011-01-01

    Whereas the use of discrete pitch intervals is characteristic of most musical traditions, the size of the intervals and the way in which they are used is culturally specific. Here we examine the hypothesis that these differences arise because of a link between the tonal characteristics of a culture's music and its speech. We tested this idea by comparing pitch intervals in the traditional music of three tone language cultures (Chinese, Thai and Vietnamese) and three non-tone language cultures (American, French and German) with pitch intervals between voiced speech segments. Changes in pitch direction occur more frequently and pitch intervals are larger in the music of tone compared to non-tone language cultures. More frequent changes in pitch direction and larger pitch intervals are also apparent in the speech of tone compared to non-tone language cultures. These observations suggest that the different tonal preferences apparent in music across cultures are closely related to the differences in the tonal characteristics of voiced speech. PMID:21637716

  9. Secure Recognition of Voice-Less Commands Using Videos

    NASA Astrophysics Data System (ADS)

    Yau, Wai Chee; Kumar, Dinesh Kant; Weghorn, Hans

    Interest in voice recognition technologies for internet applications is growing due to the flexibility of speech-based communication. The major drawback with the use of sound for internet access with computers is that the commands will be audible to other people in the vicinity. This paper examines a secure and voice-less method for recognition of speech-based commands using video without evaluating sound signals. The proposed approach represents mouth movements in the video data using 2D spatio-temporal templates (STT). Zernike moments (ZM) are computed from STT and fed into support vector machines (SVM) to be classified into one of the utterances. The experimental results demonstrate that the proposed technique produces a high accuracy of 98% in a phoneme classification task. The proposed technique is demonstrated to be invariant to global variations of illumination level. Such a system is useful for securely interpreting user commands for internet applications on mobile devices.

  10. Detecting Parkinson's disease from sustained phonation and speech signals.

    PubMed

    Vaiciukynas, Evaldas; Verikas, Antanas; Gelzinis, Adas; Bacauskiene, Marija

    2017-01-01

    This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson's disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC) and smart phone (SP) microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF) is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER) and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.

  11. Selective attention to human voice enhances brain activity bilaterally in the superior temporal sulcus.

    PubMed

    Alho, Kimmo; Vorobyev, Victor A; Medvedev, Svyatoslav V; Pakhomov, Sergey V; Starchenko, Maria G; Tervaniemi, Mari; Näätänen, Risto

    2006-02-23

    Regional cerebral blood flow was measured with positron emission tomography (PET) in 10 healthy male volunteers. They heard two binaurally delivered concurrent stories, one spoken by a male voice and the other by a female voice. A third story was presented at the same time as a text running on a screen. The subjects were instructed to attend silently to one of the stories at a time. In an additional resting condition, no stories were delivered. PET data showed that in comparison with the reading condition, the brain activity in the speech-listening conditions was enhanced bilaterally in the anterior superior temporal sulcus including cortical areas that have been reported to be specifically sensitive to human voice. Previous studies on attention to non-linguistic sounds and visual objects, in turn, showed prefrontal activations that are presumably related to attentional control functions. However, comparisons of the present speech-listening and reading conditions with each other or with the resting condition indicated no prefrontal activity, except for an activation in the inferior frontal cortex that was presumably associated with semantic and syntactic processing of the attended story. Thus, speech listening, as well as reading, even in a distracting environment appears to depend less on the prefrontal control functions than do other types of attention-demanding tasks, probably because selective attention to speech and written text are over-learned actions rehearsed daily.

  12. When Infants Talk, Infants Listen: Pre-Babbling Infants Prefer Listening to Speech with Infant Vocal Properties

    ERIC Educational Resources Information Center

    Masapollo, Matthew; Polka, Linda; Ménard, Lucie

    2016-01-01

    To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre-babbling infants (at 4-6 months) prefer listening to…

  13. Acetylcholine is released from taste cells, enhancing taste signalling

    PubMed Central

    Dando, Robin; Roper, Stephen D

    2012-01-01

    Acetylcholine (ACh), a candidate neurotransmitter that has been implicated in taste buds, elicits calcium mobilization in Receptor (Type II) taste cells. Using RT-PCR analysis and pharmacological interventions, we demonstrate that the muscarinic acetylcholine receptor M3 mediates these actions. Applying ACh enhanced both taste-evoked Ca2+ responses and taste-evoked afferent neurotransmitter (ATP) secretion from taste Receptor cells. Blocking muscarinic receptors depressed taste-evoked responses in Receptor cells, suggesting that ACh is normally released from taste cells during taste stimulation. ACh biosensors confirmed that, indeed, taste Receptor cells secrete acetylcholine during gustatory stimulation. Genetic deletion of muscarinic receptors resulted in significantly diminished ATP secretion from taste buds. The data demonstrate a new role for acetylcholine as a taste bud transmitter. Our results imply specifically that ACh is an autocrine transmitter secreted by taste Receptor cells during gustatory stimulation, enhancing taste-evoked responses and afferent transmitter secretion. PMID:22570381

  14. Deep neural network and noise classification-based speech enhancement

    NASA Astrophysics Data System (ADS)

    Shi, Wenhua; Zhang, Xiongwei; Zou, Xia; Han, Wei

    2017-07-01

    In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.

  15. Contributions of speech science to the technology of man-machine voice interactions

    NASA Technical Reports Server (NTRS)

    Lea, Wayne A.

    1977-01-01

    Research in speech understanding was reviewed. Plans which include prosodics research, phonological rules for speech understanding systems, and continued interdisciplinary phonetics research are discussed. Improved acoustic phonetic analysis capabilities in speech recognizers are suggested.

  16. Acoustic analysis of speech under stress.

    PubMed

    Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish

    2015-01-01

    When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation.

  17. Cerebral bases of subliminal speech priming.

    PubMed

    Kouider, Sid; de Gardelle, Vincent; Dehaene, Stanislas; Dupoux, Emmanuel; Pallier, Christophe

    2010-01-01

    While the neural correlates of unconscious perception and subliminal priming have been largely studied for visual stimuli, little is known about their counterparts in the auditory modality. Here we used a subliminal speech priming method in combination with fMRI to investigate which regions of the cerebral network for language can respond in the absence of awareness. Participants performed a lexical decision task on target items preceded by subliminal primes, which were either phonetically identical or different from the target. Moreover, the prime and target could be spoken by the same speaker or by two different speakers. Word repetition reduced the activity in the insula and in the left superior temporal gyrus. Although the priming effect on reaction times was independent of voice manipulation, neural repetition suppression was modulated by speaker change in the superior temporal gyrus while the insula showed voice-independent priming. These results provide neuroimaging evidence of subliminal priming for spoken words and inform us on the first, unconscious stages of speech perception.

  18. Discrimination of taste qualities among mouse fungiform taste bud cells.

    PubMed

    Yoshida, Ryusuke; Miyauchi, Aya; Yasuo, Toshiaki; Jyotaki, Masafumi; Murata, Yoshihiro; Yasumatsu, Keiko; Shigemura, Noriatsu; Yanagawa, Yuchio; Obata, Kunihiko; Ueno, Hiroshi; Margolskee, Robert F; Ninomiya, Yuzo

    2009-09-15

    Multiple lines of evidence from molecular studies indicate that individual taste qualities are encoded by distinct taste receptor cells. In contrast, many physiological studies have found that a significant proportion of taste cells respond to multiple taste qualities. To reconcile this apparent discrepancy and to identify taste cells that underlie each taste quality, we investigated taste responses of individual mouse fungiform taste cells that express gustducin or GAD67, markers for specific types of taste cells. Type II taste cells respond to sweet, bitter or umami tastants, express taste receptors, gustducin and other transduction components. Type III cells possess putative sour taste receptors, and have well elaborated conventional synapses. Consistent with these findings we found that gustducin-expressing Type II taste cells responded best to sweet (25/49), bitter (20/49) or umami (4/49) stimuli, while all GAD67 (Type III) taste cells examined (44/44) responded to sour stimuli and a portion of them showed multiple taste sensitivities, suggesting discrimination of each taste quality among taste bud cells. These results were largely consistent with those previously reported with circumvallate papillae taste cells. Bitter-best taste cells responded to multiple bitter compounds such as quinine, denatonium and cyclohexamide. Three sour compounds, HCl, acetic acid and citric acid, elicited responses in sour-best taste cells. These results suggest that taste cells may be capable of recognizing multiple taste compounds that elicit similar taste sensation. We did not find any NaCl-best cells among the gustducin and GAD67 taste cells, raising the possibility that salt sensitive taste cells comprise a different population.

  19. Military applications of automatic speech recognition and future requirements

    NASA Technical Reports Server (NTRS)

    Beek, Bruno; Cupples, Edward J.

    1977-01-01

    An updated summary of the state-of-the-art of automatic speech recognition and its relevance to military applications is provided. A number of potential systems for military applications are under development. These include: (1) digital narrowband communication systems; (2) automatic speech verification; (3) on-line cartographic processing unit; (4) word recognition for militarized tactical data system; and (5) voice recognition and synthesis for aircraft cockpit.

  20. The Auditory-Brainstem Response to Continuous, Non-repetitive Speech Is Modulated by the Speech Envelope and Reflects Speech Processing

    PubMed Central

    Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

    2016-01-01

    The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286

  1. Validation of the Acoustic Voice Quality Index Version 03.01 and the Acoustic Breathiness Index in the Spanish language.

    PubMed

    Delgado Hernández, Jonathan; León Gómez, Nieves M; Jiménez, Alejandra; Izquierdo, Laura M; Barsties V Latoszek, Ben

    2018-05-01

    The aim of this study was to validate the Acoustic Voice Quality Index 03.01 (AVQIv3) and the Acoustic Breathiness Index (ABI) in the Spanish language. Concatenated voice samples of continuous speech (cs) and sustained vowel (sv) from 136 subjects with dysphonia and 47 vocally healthy subjects were perceptually judged for overall voice quality and breathiness severity. First, to reach a higher level of ecological validity, the proportions of cs and sv were equalized regarding the time length of 3 seconds sv part and voiced cs part, respectively. Second, concurrent validity and diagnostic accuracy were verified. A moderate reliability of overall voice quality and breathiness severity from 5 experts was used. It was found that 33 syllables as standardization of the cs part, which represents 3 seconds of voiced cs, allows the equalization of both speech tasks. A strong correlation was revealed between AVQIv3 and overall voice quality and ABI and perceived breathiness severity. Additionally, the best diagnostic outcome was identified at a threshold of 2.28 and 3.40 for AVQIv3 and ABI, respectively. The AVQIv3 and ABI showed in the Spanish language valid and robust results to quantify abnormal voice qualities regarding overall voice quality and breathiness severity.

  2. Speech perception in individuals with auditory dys-synchrony.

    PubMed

    Kumar, U A; Jayaram, M

    2011-03-01

    This study aimed to evaluate the effect of lengthening the transition duration of selected speech segments upon the perception of those segments in individuals with auditory dys-synchrony. Thirty individuals with auditory dys-synchrony participated in the study, along with 30 age-matched normal hearing listeners. Eight consonant-vowel syllables were used as auditory stimuli. Two experiments were conducted. Experiment one measured the 'just noticeable difference' time: the smallest prolongation of the speech sound transition duration which was noticeable by the subject. In experiment two, speech sounds were modified by lengthening the transition duration by multiples of the just noticeable difference time, and subjects' speech identification scores for the modified speech sounds were assessed. Subjects with auditory dys-synchrony demonstrated poor processing of temporal auditory information. Lengthening of speech sound transition duration improved these subjects' perception of both the placement and voicing features of the speech syllables used. These results suggest that innovative speech processing strategies which enhance temporal cues may benefit individuals with auditory dys-synchrony.

  3. Henry's voices: the representation of auditory verbal hallucinations in an autobiographical narrative.

    PubMed

    Demjén, Zsófia; Semino, Elena

    2015-06-01

    The book Henry's Demons (2011) recounts the events surrounding Henry Cockburn's diagnosis of schizophrenia from the alternating perspectives of Henry himself and his father Patrick. In this paper, we present a detailed linguistic analysis of Henry's first-person accounts of experiences that could be described as auditory verbal hallucinations. We first provide a typology of Henry's voices, taking into account who or what is presented as speaking, what kinds of utterances they produce and any salient stylistic features of these utterances. We then discuss the linguistically distinctive ways in which Henry represents these voices in his narrative. We focus on the use of Direct Speech as opposed to other forms of speech presentation, the use of the sensory verbs hear and feel and the use of 'non-factive' expressions such as I thought and as if. We show how different linguistic representations may suggest phenomenological differences between the experience of hallucinatory voices and the perception of voices that other people can also hear. We, therefore, propose that linguistic analysis is ideally placed to provide in-depth accounts of the phenomenology of voice hearing and point out the implications of this approach for clinical practice and mental healthcare. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  4. Perception and the temporal properties of speech

    NASA Astrophysics Data System (ADS)

    Gordon, Peter C.

    1991-11-01

    Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.

  5. Hands-free human-machine interaction with voice

    NASA Astrophysics Data System (ADS)

    Juang, B. H.

    2004-05-01

    Voice is natural communication interface between a human and a machine. The machine, when placed in today's communication networks, may be configured to provide automation to save substantial operating cost, as demonstrated in AT&T's VRCP (Voice Recognition Call Processing), or to facilitate intelligent services, such as virtual personal assistants, to enhance individual productivity. These intelligent services often need to be accessible anytime, anywhere (e.g., in cars when the user is in a hands-busy-eyes-busy situation or during meetings where constantly talking to a microphone is either undersirable or impossible), and thus call for advanced signal processing and automatic speech recognition techniques which support what we call ``hands-free'' human-machine communication. These techniques entail a broad spectrum of technical ideas, ranging from use of directional microphones and acoustic echo cancellatiion to robust speech recognition. In this talk, we highlight a number of key techniques that were developed for hands-free human-machine communication in the mid-1990s after Bell Labs became a unit of Lucent Technologies. A video clip will be played to demonstrate the accomplishement.

  6. Dimension-Based Statistical Learning Affects Both Speech Perception and Production.

    PubMed

    Lehet, Matthew; Holt, Lori L

    2017-04-01

    Multiple acoustic dimensions signal speech categories. However, dimensions vary in their informativeness; some are more diagnostic of category membership than others. Speech categorization reflects these dimensional regularities such that diagnostic dimensions carry more "perceptual weight" and more effectively signal category membership to native listeners. Yet perceptual weights are malleable. When short-term experience deviates from long-term language norms, such as in a foreign accent, the perceptual weight of acoustic dimensions in signaling speech category membership rapidly adjusts. The present study investigated whether rapid adjustments in listeners' perceptual weights in response to speech that deviates from the norms also affects listeners' own speech productions. In a word recognition task, the correlation between two acoustic dimensions signaling consonant categories, fundamental frequency (F0) and voice onset time (VOT), matched the correlation typical of English, and then shifted to an "artificial accent" that reversed the relationship, and then shifted back. Brief, incidental exposure to the artificial accent caused participants to down-weight perceptual reliance on F0, consistent with previous research. Throughout the task, participants were intermittently prompted with pictures to produce these same words. In the block in which listeners heard the artificial accent with a reversed F0 × VOT correlation, F0 was a less robust cue to voicing in listeners' own speech productions. The statistical regularities of short-term speech input affect both speech perception and production, as evidenced via shifts in how acoustic dimensions are weighted. Copyright © 2016 Cognitive Science Society, Inc.

  7. Dramatic effects of speech task on motor and linguistic planning in severely dysfluent parkinsonian speech

    PubMed Central

    Van Lancker Sidtis, Diana; Cameron, Krista; Sidtis, John J.

    2015-01-01

    In motor speech disorders, dysarthric features impacting intelligibility, articulation, fluency, and voice emerge more saliently in conversation than in repetition, reading, or singing. A role of the basal ganglia in these task discrepancies has been identified. Further, more recent studies of naturalistic speech in basal ganglia dysfunction have revealed that formulaic language is more impaired than novel language. This descriptive study extends these observations to a case of severely dysfluent dysarthria due to a parkinsonian syndrome. Dysfluencies were quantified and compared for conversation, two forms of repetition, reading, recited speech, and singing. Other measures examined phonetic inventories, word forms, and formulaic language. Phonetic, syllabic, and lexical dysfluencies were more abundant in conversation than in other task conditions. Formulaic expressions in conversation were reduced compared to normal speakers. A proposed explanation supports the notion that the basal ganglia contribute to formulation of internal models for execution of speech. PMID:22774929

  8. Tailoring Cognitive Behavioral Therapy to Subtypes of Voice-Hearing

    PubMed Central

    Smailes, David; Alderson-Day, Ben; Fernyhough, Charles; McCarthy-Jones, Simon; Dodgson, Guy

    2015-01-01

    Cognitive behavioral therapy (CBT) for voice-hearing (i.e., auditory verbal hallucinations; AVH) has, at best, small to moderate effects. One possible reason for this limited efficacy is that current CBT approaches tend to conceptualize voice-hearing as a homogenous experience in terms of the cognitive processes involved in AVH. However, the highly heterogeneous nature of voice-hearing suggests that many different cognitive processes may be involved in the etiology of AVH. These heterogeneous voice-hearing experiences do, however, appear to cluster into a set of subtypes, opening up the possibility of tailoring treatment to the subtype of AVH that a voice-hearer reports. In this paper, we (a) outline our rationale for tailoring CBT to subtypes of voice-hearing, (b) describe CBT for three putative subtypes of AVH (inner speech-based AVH, memory-based AVH, and hypervigilance AVH), and (c) discuss potential limitations and problems with such an approach. We conclude by arguing that tailoring CBT to subtypes of voice-hearing could prove to be a valuable therapeutic development, which may be especially effective when used in early intervention in psychosis services. PMID:26733919

  9. The singer's voice range profile: female professional opera soloists.

    PubMed

    Lamarche, Anick; Ternström, Sten; Pabon, Peter

    2010-07-01

    This work concerns the collection of 30 voice range profiles (VRPs) of female operatic voice. We address the questions: Is there a need for a singer's protocol in VRP acquisition? Are physiological measurements sufficient or should the measurement of performance capabilities also be included? Can we address the female singing voice in general or is there a case for categorizing voices when studying phonetographic data? Subjects performed a series of structured tasks involving both standard speech voice protocols and additional singing tasks. Singers also completed an extensive questionnaire. Physiological VRPs differ from performance VRPs. Two new VRP metrics, the voice area above a defined level threshold and the dynamic range independent from the fundamental frequency (F(0)), were found to be useful in the analysis of singer VRPs. Task design had no effect on performance VRP outcomes. Voice category differences were mainly attributable to phonation frequency-based information. Results support the clinical importance of addressing the vocal instrument as it is used in performance. Equally important is the elaboration of a protocol suitable for the singing voice. The given context and instructions can be more important than task design for performance VRPs. Yet, for physiological VRP recordings, task design remains critical. Both types of VRPs are suggested for a singer's voice evaluation. Copyright (c) 2010 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  10. SPEECH PERCEPTION AS A TALKER-CONTINGENT PROCESS

    PubMed Central

    Nygaard, Lynne C.; Sommers, Mitchell S.; Pisoni, David B.

    2011-01-01

    To determine how familiarity with a talker’s voice affects perception of spoken words, we trained two groups of subjects to recognize a set of voices over a 9-day period. One group then identified novel words produced by the same set of talkers at four signal-to-noise ratios. Control subjects identified the same words produced by a different set of talkers. The results showed that the ability to identify a talker’s voice improved intelligibility of novel words produced by that talker. The results suggest that speech perception may involve talker-contingent processes whereby perceptual learning of aspects of the vocal source facilitates the subsequent phonetic analysis of the acoustic signal. PMID:21526138

  11. Some articulatory details of emotional speech

    NASA Astrophysics Data System (ADS)

    Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth

    2005-09-01

    Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.

  12. Effects of voice training and voice hygiene education on acoustic and perceptual speech parameters and self-reported vocal well-being in female teachers.

    PubMed

    Ilomaki, Irma; Laukkanen, Anne-Maria; Leppanen, Kirsti; Vilkman, Erkki

    2008-01-01

    Voice education programs may help in optimizing teachers' voice use. This study compared effects of voice training (VT) and voice hygiene lecture (VHL) in 60 randomly assigned female teachers. All 60 attended the lecture, and 30 completed a short training course in addition. Text reading was recorded in working environments and analyzed for fundamental frequency (F0), equivalent sound level (Leq), alpha ratio, jitter, shimmer, and perceptual quality. Self-reports of vocal well-being were registered. In the VHL group, increased F0 and difficulty of phonation and in the VT group decreased perturbation, increased alpha ratio, easier phonation, and improved perceptual and self-reported voice quality were found. Both groups equally self-reported increase of voice care knowledge. Results seem to indicate improved vocal well-being after training.

  13. Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng Arden; Chen, Jingdong; Chen, Shaoyan Sharyl

    2009-01-01

    A new approach has been proposed for increasing astronaut comfort and speech capture. Currently, the special design of a spacesuit forms an extreme acoustic environment making it difficult to capture clear speech without compromising comfort. The proposed Integrated Spacesuit Audio (ISA) system is to incorporate the microphones into the helmet and use software to extract voice signals from background noise.

  14. An automatic speech recognition system with speaker-independent identification support

    NASA Astrophysics Data System (ADS)

    Caranica, Alexandru; Burileanu, Corneliu

    2015-02-01

    The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.

  15. Application of the acoustic voice quality index for objective measurement of dysphonia severity.

    PubMed

    Núñez-Batalla, Faustino; Díaz-Fresno, Estefanía; Álvarez-Fernández, Andrea; Muñoz Cordero, Gabriela; Llorente Pendás, José Luis

    Over the past several decades, many acoustic parameters have been studied as sensitive to and to measure dysphonia. However, current acoustic measures might not be sensitive measures of perceived voice quality. A meta-analysis which evaluated the relationship between perceived overall voice quality and several acoustic-phonetic correlates, identified measures that do not rely on the extraction of the fundamental period, such the measures derived from the cepstrum, and that can be used in sustained vowel as well as continuous speech samples. A specific and recently developed method to quantify the severity of overall dysphonia is the acoustic voice quality index (AVQI) that is a multivariate construct that combines multiple acoustic markers to yield a single number that correlates reasonably with overall vocal quality. This research is based on one pool of voice recordings collected in two sets of subjects: 60 vocally normal and 58 voice disordered participants. A sustained vowel and a sample of connected speech were recorded and analyzed to obtain the six parameters included in the AVQI using the program Praat. Statistical analysis was completed using SPSS for Windows, version 12.0. Correlation between perception of overall voice quality and AVQI: A significant difference exists (t(95) = 9.5; p<.000) between normal and dysphonic voices. The findings of this study demonstrate the clinical feasibility of the AVQI as a measure of dysphonia severity. Copyright © 2017 Elsevier España, S.L.U. and Sociedad Española de Otorrinolaringología y Cirugía de Cabeza y Cuello. All rights reserved.

  16. Development of the child's voice: premutation, mutation.

    PubMed

    Hacki, T; Heitmüller, S

    1999-10-05

    Voice range profile (VRP) measurement was used to evaluate the vocal capabilities of 180 children aged between 4 and 12 years without voice pathology. There were 10 boys and 10 girls in each age group. Using an automatic VRP measurement system, F0 and SPL dB (lin) were determined and displayed two-dimensionally in real time. The speaking voice, the shouting voice and the singing voice were investigated. The results show that vocal capabilities grow with advancing age, but not continuously. The lowering of the habitual pitch of the speaking voice as well as of the entire speaking pitch range occurs for girls between the ages of 7 and 8, for boys between 8 and 9. A temporary restriction of the minimum vocal intensity of the speaking voice (the ability to speak softly) as well as of the singing voice occurs for girls and for boys at the age of 7-8. A decrease of the maximum speech intensity is found for girls at the age of between 7 and 8, for boys between 8 and 9. A lowering of the pitch as well as of the intensity of the shouting voice occurs for both sexes from the age of 10. In contrast to earlier general opinion we note for girls a stage of premutation (between the age of 7 and 8) with essentially the same changes seen among boys, but 1 year earlier. The beginning of the mutation can be fixed at the age of 10-11 years.

  17. Speech motor control and acute mountain sickness

    NASA Technical Reports Server (NTRS)

    Cymerman, Allen; Lieberman, Philip; Hochstadt, Jesse; Rock, Paul B.; Butterfield, Gail E.; Moore, Lorna G.

    2002-01-01

    BACKGROUND: An objective method that accurately quantifies the severity of Acute Mountain Sickness (AMS) symptoms is needed to enable more reliable evaluation of altitude acclimatization and testing of potentially beneficial interventions. HYPOTHESIS: Changes in human articulation, as quantified by timed variations in acoustic waveforms of specific spoken words (voice onset time; VOT), are correlated with the severity of AMS. METHODS: Fifteen volunteers were exposed to a simulated altitude of 4300 m (446 mm Hg) in a hypobaric chamber for 48 h. Speech motor control was determined from digitally recorded and analyzed timing patterns of 30 different monosyllabic words characterized as voiced and unvoiced, and as labial, alveolar, or velar. The Environmental Symptoms Questionnaire (ESQ) was used to assess AMS. RESULTS: Significant AMS symptoms occurred after 4 h, peaked at 16 h, and returned toward baseline after 48 h. Labial VOTs were shorter after 4 and 39 h of exposure; velar VOTs were altered only after 4 h; and there were no changes in alveolar VOTs. The duration of vowel sounds was increased after 4 h of exposure and returned to normal thereafter. Only 1 of 15 subjects did not increase vowel time after 4 h of exposure. The 39-h labial (p = 0.009) and velar (p = 0.037) voiced-unvoiced timed separations consonants and the symptoms of AMS were significantly correlated. CONCLUSIONS: Two objective measures of speech production were affected by exposure to 4300 m altitude and correlated with AMS severity. Alterations in speech production may represent an objective measure of AMS and central vulnerability to hypoxia.

  18. Digital signal processing algorithms for automatic voice recognition

    NASA Technical Reports Server (NTRS)

    Botros, Nazeih M.

    1987-01-01

    The current digital signal analysis algorithms are investigated that are implemented in automatic voice recognition algorithms. Automatic voice recognition means, the capability of a computer to recognize and interact with verbal commands. The digital signal is focused on, rather than the linguistic, analysis of speech signal. Several digital signal processing algorithms are available for voice recognition. Some of these algorithms are: Linear Predictive Coding (LPC), Short-time Fourier Analysis, and Cepstrum Analysis. Among these algorithms, the LPC is the most widely used. This algorithm has short execution time and do not require large memory storage. However, it has several limitations due to the assumptions used to develop it. The other 2 algorithms are frequency domain algorithms with not many assumptions, but they are not widely implemented or investigated. However, with the recent advances in the digital technology, namely signal processors, these 2 frequency domain algorithms may be investigated in order to implement them in voice recognition. This research is concerned with real time, microprocessor based recognition algorithms.

  19. Comparison of voice-automated transcription and human transcription in generating pathology reports.

    PubMed

    Al-Aynati, Maamoun M; Chorneyko, Katherine A

    2003-06-01

    Software that can convert spoken words into written text has been available since the early 1980s. Early continuous speech systems were developed in 1994, with the latest commercially available editions having a claimed accuracy of up to 98% of speech recognition at natural speech rates. To evaluate the efficacy of one commercially available voice-recognition software system with pathology vocabulary in generating pathology reports and to compare this with human transcription. To draw cost analysis conclusions regarding human versus computer-based transcription. Two hundred six routine pathology reports from the surgical pathology material handled at St Joseph's Healthcare, Hamilton, Ontario, were generated simultaneously using computer-based transcription and human transcription. The following hardware and software were used: a desktop 450-MHz Intel Pentium III processor with 192 MB of RAM, a speech-quality sound card (Sound Blaster), noise-canceling headset microphone, and IBM ViaVoice Pro version 8 with pathology vocabulary support (Voice Automated, Huntington Beach, Calif). The cost of the hardware and software used was approximately Can 2250 dollars. A total of 23 458 words were transcribed using both methods with a mean of 114 words per report. The mean accuracy rate was 93.6% (range, 87.4%-96%) using the computer software, compared to a mean accuracy of 99.6% (range, 99.4%-99.8%) for human transcription (P <.001). Time needed to edit documents by the primary evaluator (M.A.) using the computer was on average twice that needed for editing the documents produced by human transcriptionists (range, 1.4-3.5 times). The extra time needed to edit documents was 67 minutes per week (13 minutes per day). Computer-based continuous speech-recognition systems in pathology can be successfully used in pathology practice even during the handling of gross pathology specimens. The relatively low accuracy rate of this voice-recognition software with resultant increased editing

  20. [The progress in the rehabilitation of dysarthria in Parkinson disease using LSVT (Lee Silverman Voice Treatment)].

    PubMed

    Kamińska, Ilona; Zebryk-Stopa, Anna; Pruszewicz, Antoni; Dziubalska-Kołaczyk, Katarzyna; Połczyńska-Fiszer, Monika; Pietrala, Dawid; Przedpelska-Ober, Elzbieta

    2007-01-01

    Parkison's disease causes damage to the central nervous system resulting in bradykinesia, muscle rigidity, rest tremor and dysarthric speech. In clinical terms dysarthria denotes the dysfunction of articulation, phonation and respiration. It is brought about by the impairment of neural paths innervating the speech apparatus, thus causing a decreased ability to communicate. The study was conducted by the Center for Speech and Language Processing (CSLP), Adam Mickiewicz University, Poznań and the Chair and Department of Phoniatrics and Audiology, the Medical University, Poznań within the interdisciplinary research project grant called "Speech and Language Virtual Therapist for Individuals with Parkinson's Disease". Apart from traditional voice and speech therapies, one of the ways of treating speech disturbances accompanying Parkinson's disease is an innovative Lee Silverman Voice Treatment (LSVT). The purpose of this innovative method introduced by dr L. Ramig and colleagues in 1987-1988, is to teach the patient to speak loud. As a result of co-operation between CLSP and the Center for Spoken Language Research (CSLR) at the University of Colorado, Boulder, USA, a Polish version of LSVT Virtual Therapist computer programme was created (LSVTVT). The programme is based on the principles of LSVT. The positive outcomes of the therapy give hope to Parkinson's disease patients with dysarthria, as well as to speech therapists.

  1. Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

    PubMed

    Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

    2015-07-01

    It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line

  2. Processing umami and other tastes in mammalian taste buds.

    PubMed

    Roper, Stephen D; Chaudhari, Nirupa

    2009-07-01

    Neuroscientists are now coming to appreciate that a significant degree of information processing occurs in the peripheral sensory organs of taste prior to signals propagating to the brain. Gustatory stimulation causes taste bud cells to secrete neurotransmitters that act on adjacent taste bud cells (paracrine transmitters) as well as on primary sensory afferent fibers (neurocrine transmitters). Paracrine transmission, representing cell-cell communication within the taste bud, has the potential to shape the final signal output that taste buds transmit to the brain. The following paragraphs summarize current thinking about how taste signals generally, and umami taste in particular, are processed in taste buds.

  3. Audiovisual Speech Perception and Eye Gaze Behavior of Adults with Asperger Syndrome

    ERIC Educational Resources Information Center

    Saalasti, Satu; Katsyri, Jari; Tiippana, Kaisa; Laine-Hernandez, Mari; von Wendt, Lennart; Sams, Mikko

    2012-01-01

    Audiovisual speech perception was studied in adults with Asperger syndrome (AS), by utilizing the McGurk effect, in which conflicting visual articulation alters the perception of heard speech. The AS group perceived the audiovisual stimuli differently from age, sex and IQ matched controls. When a voice saying /p/ was presented with a face…

  4. Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction.

    PubMed

    Nass, C; Lee, K M

    2001-09-01

    Would people exhibit similarity-attraction and consistency-attraction toward unambiguously computer-generated speech even when personality is clearly not relevant? In Experiment 1, participants (extrovert or introvert) heard a synthesized voice (extrovert or introvert) on a book-buying Web site. Participants accurately recognized personality cues in text to speech and showed similarity-attraction in their evaluation of the computer voice, the book reviews, and the reviewer. Experiment 2, in a Web auction context, added personality of the text to the previous design. The results replicated Experiment 1 and demonstrated consistency (voice and text personality)-attraction. To maximize liking and trust, designers should set parameters, for example, words per minute or frequency range, that create a personality that is consistent with the user and the content being presented.

  5. Speech recovery device

    DOEpatents

    Frankle, Christen M.

    2004-04-20

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  6. Effects of HearFones on speaking and singing voice quality.

    PubMed

    Laukkanen, Anne-Maria; Mickelson, Nils Peter; Laitala, Marja; Syrjä, Tiina; Salo, Arla; Sihvo, Marketta

    2004-12-01

    HearFones (HF) have been designed to enhance auditory feedback during phonation. This study investigated the effects of HF (1) on sound perceivable by the subject, (2) on voice quality in reading and singing, and (3) on voice production in speech and singing at the same pitch and sound level. Test 1: Text reading was recorded with two identical microphones in the ears of a subject. One ear was covered with HF, and the other was free. Four subjects attended this test. Tests 2 and 3: A reading sample was recorded from 13 subjects and a song from 12 subjects without and with HF on. Test 4: Six females repeated [pa:p:a] in speaking and singing modes without and with HF on same pitch and sound level. Long-term average spectra were made (Tests 1-3), and formant frequencies, fundamental frequency, and sound level were measured (Tests 2 and 3). Subglottic pressure was estimated from oral pressure in [p], and simultaneously electroglottography (EGG) was registered during voicing on [a:] (Test 4). Voice quality in speech and singing was evaluated by three professional voice trainers (Tests 2-4). HF seemed to enhance sound perceivable at the whole range studied (0-8 kHz), with the greatest enhancement (up to ca 25 dB) being at 1-3 kHz and at 4-7 kHz. The subjects tended to decrease loudness with HF (when sound level was not being monitored). In more than half of the cases, voice quality was evaluated "less strained" and "better controlled" with HF. When pitch and loudness were constant, no clear differences were heard but closed quotient of the EGG signal was higher and the signal more skewed, suggesting a better glottal closure and/or diminished activity of the thyroarytenoid muscle.

  7. Nebulized isotonic saline improves voice production in Sjögren's syndrome.

    PubMed

    Tanner, Kristine; Nissen, Shawn L; Merrill, Ray M; Miner, Alison; Channell, Ron W; Miller, Karla L; Elstad, Mark; Kendall, Katherine A; Roy, Nelson

    2015-10-01

    This study examined the effects of a topical vocal fold hydration treatment on voice production over time. Prospective, longitudinal, within-subjects A (baseline), B (treatment), A (withdrawal/reversal), B (treatment) experimental design. Eight individuals with primary Sjögren's syndrome (SS), an autoimmune disease causing laryngeal dryness, completed an 8-week A-B-A-B experiment. Participants performed twice-daily audio recordings of connected speech and sustained vowels and then rated vocal effort, mouth dryness, and throat dryness. Two-week treatment phases introduced twice-daily 9-mL doses of nebulized isotonic saline (0.9% Na(+)Cl(-)). Voice handicap and patient-based measures of SS disease severity were collected before and after each 2-week phase. Connected speech and sustained vowels were analyzed using the Cepstral Spectral Index of Dysphonia (CSID). Acoustic and patient-based ratings during each baseline and treatment phase were analyzed and compared. Baseline CSID and patient-based ratings were in the mild-to-moderate range. CSID measures of voice severity improved by approximately 20% with nebulized saline treatment and worsened during treatment withdrawal. Posttreatment CSID values fell within the normal-to-mild range. Similar patterns were observed in patient-based ratings of vocal effort and dryness. CSID values and patient-based ratings correlated significantly (P < .05). Nebulized isotonic saline improves voice production based on acoustic and patient-based ratings of voice severity. Future work should optimize topical vocal fold hydration treatment formulations, dose, and delivery methodologies for various patient populations. This study lays the groundwork for future topical vocal fold hydration treatment development to manage and possibly prevent dehydration-related voice disorders. 2b. © 2015 The American Laryngological, Rhinological and Otological Society, Inc.

  8. Research in speech communication.

    PubMed

    Flanagan, J

    1995-10-24

    Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.

  9. Implicit Multisensory Associations Influence Voice Recognition

    PubMed Central

    von Kriegstein, Katharina; Giraud, Anne-Lise

    2006-01-01

    Natural objects provide partially redundant information to the brain through different sensory modalities. For example, voices and faces both give information about the speech content, age, and gender of a person. Thanks to this redundancy, multimodal recognition is fast, robust, and automatic. In unimodal perception, however, only part of the information about an object is available. Here, we addressed whether, even under conditions of unimodal sensory input, crossmodal neural circuits that have been shaped by previous associative learning become activated and underpin a performance benefit. We measured brain activity with functional magnetic resonance imaging before, while, and after participants learned to associate either sensory redundant stimuli, i.e. voices and faces, or arbitrary multimodal combinations, i.e. voices and written names, ring tones, and cell phones or brand names of these cell phones. After learning, participants were better at recognizing unimodal auditory voices that had been paired with faces than those paired with written names, and association of voices with faces resulted in an increased functional coupling between voice and face areas. No such effects were observed for ring tones that had been paired with cell phones or names. These findings demonstrate that brief exposure to ecologically valid and sensory redundant stimulus pairs, such as voices and faces, induces specific multisensory associations. Consistent with predictive coding theories, associative representations become thereafter available for unimodal perception and facilitate object recognition. These data suggest that for natural objects effective predictive signals can be generated across sensory systems and proceed by optimization of functional connectivity between specialized cortical sensory modules. PMID:17002519

  10. Implicit multisensory associations influence voice recognition.

    PubMed

    von Kriegstein, Katharina; Giraud, Anne-Lise

    2006-10-01

    Natural objects provide partially redundant information to the brain through different sensory modalities. For example, voices and faces both give information about the speech content, age, and gender of a person. Thanks to this redundancy, multimodal recognition is fast, robust, and automatic. In unimodal perception, however, only part of the information about an object is available. Here, we addressed whether, even under conditions of unimodal sensory input, crossmodal neural circuits that have been shaped by previous associative learning become activated and underpin a performance benefit. We measured brain activity with functional magnetic resonance imaging before, while, and after participants learned to associate either sensory redundant stimuli, i.e. voices and faces, or arbitrary multimodal combinations, i.e. voices and written names, ring tones, and cell phones or brand names of these cell phones. After learning, participants were better at recognizing unimodal auditory voices that had been paired with faces than those paired with written names, and association of voices with faces resulted in an increased functional coupling between voice and face areas. No such effects were observed for ring tones that had been paired with cell phones or names. These findings demonstrate that brief exposure to ecologically valid and sensory redundant stimulus pairs, such as voices and faces, induces specific multisensory associations. Consistent with predictive coding theories, associative representations become thereafter available for unimodal perception and facilitate object recognition. These data suggest that for natural objects effective predictive signals can be generated across sensory systems and proceed by optimization of functional connectivity between specialized cortical sensory modules.

  11. The role of voice input for human-machine communication.

    PubMed Central

    Cohen, P R; Oviatt, S L

    1995-01-01

    Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology. PMID:7479803

  12. The Belt voice: Acoustical measurements and esthetic correlates

    NASA Astrophysics Data System (ADS)

    Bounous, Barry Urban

    This dissertation explores the esthetic attributes of the Belt voice through spectral acoustical analysis. The process of understanding the nature and safe practice of Belt is just beginning, whereas the understanding of classical singing is well established. The unique nature of the Belt sound provides difficulties for voice teachers attempting to evaluate the quality and appropriateness of a particular sound or performance. This study attempts to provide answers to the question "does Belt conform to a set of measurable esthetic standards?" In answering this question, this paper expands on a previous study of the esthetic attributes of the classical baritone voice (see "Vocal Beauty", NATS Journal 51,1) which also drew some tentative conclusions about the Belt voice but which had an inadequate sample pool of subjects from which to draw. Further, this study demonstrates that it is possible to scientifically investigate the realm of musical esthetics in the singing voice. It is possible to go beyond the "a trained voice compared to an untrained voice" paradigm when evaluating quantitative vocal parameters and actually investigate what truly beautiful voices do. There are functions of sound energy (measured in dB) transference which may affect the nervous system in predictable ways and which can be measured and associated with esthetics. This study does not show consistency in measurements for absolute beauty (taste) even among belt teachers and researchers but does show some markers with varying degrees of importance which may point to a difference between our cognitive learned response to singing and our emotional, more visceral response to sounds. The markers which are significant in determining vocal beauty are: (1) Vibrancy-Characteristics of vibrato including speed, width, and consistency (low variability). (2) Spectral makeup-Ratio of partial strength above the fundamental to the fundamental. (3) Activity of the voice-The quantity of energy being produced. (4

  13. Neural network based speech synthesizer: A preliminary report

    NASA Technical Reports Server (NTRS)

    Villarreal, James A.; Mcintire, Gary

    1987-01-01

    A neural net based speech synthesis project is discussed. The novelty is that the reproduced speech was extracted from actual voice recordings. In essence, the neural network learns the timing, pitch fluctuations, connectivity between individual sounds, and speaking habits unique to that individual person. The parallel distributed processing network used for this project is the generalized backward propagation network which has been modified to also learn sequences of actions or states given in a particular plan.

  14. The voices of seduction: cross-gender effects in processing of erotic prosody

    PubMed Central

    Ethofer, Thomas; Wiethoff, Sarah; Anders, Silke; Kreifelts, Benjamin; Grodd, Wolfgang

    2007-01-01

    Gender specific differences in cognitive functions have been widely discussed. Considering social cognition such as emotion perception conveyed by non-verbal cues, generally a female advantage is assumed. In the present study, however, we revealed a cross-gender interaction with increasing responses to the voice of opposite sex in male and female subjects. This effect was confined to erotic tone of speech in behavioural data and haemodynamic responses within voice sensitive brain areas (right middle superior temporal gyrus). The observed response pattern, thus, indicates a particular sensitivity to emotional voices that have a high behavioural relevance for the listener. PMID:18985138

  15. Male and female voices activate distinct regions in the male brain.

    PubMed

    Sokhi, Dilraj S; Hunter, Michael D; Wilkinson, Iain D; Woodruff, Peter W R

    2005-09-01

    In schizophrenia, auditory verbal hallucinations (AVHs) are likely to be perceived as gender-specific. Given that functional neuro-imaging correlates of AVHs involve multiple brain regions principally including auditory cortex, it is likely that those brain regions responsible for attribution of gender to speech are invoked during AVHs. We used functional magnetic resonance imaging (fMRI) and a paradigm utilising 'gender-apparent' (unaltered) and 'gender-ambiguous' (pitch-scaled) male and female voice stimuli to test the hypothesis that male and female voices activate distinct brain areas during gender attribution. The perception of female voices, when compared with male voices, affected greater activation of the right anterior superior temporal gyrus, near the superior temporal sulcus. Similarly, male voice perception activated the mesio-parietal precuneus area. These different gender associations could not be explained by either simple pitch perception or behavioural response because the activations that we observed were conjointly activated by both 'gender-apparent' and 'gender-ambiguous' voices. The results of this study demonstrate that, in the male brain, the perception of male and female voices activates distinct brain regions.

  16. Alternating motion rate as an index of speech motor disorder in traumatic brain injury.

    PubMed

    Wang, Yu-Tsai; Kent, Ray D; Duffy, Joseph R; Thomas, Jack E; Weismer, Gary

    2004-01-01

    The task of syllable alternating motion rate (AMR) (also called diadochokinesis) is suitable for examining speech disorders of varying degrees of severity and in individuals with varying levels of linguistic and cognitive ability. However, very limited information on this task has been published for subjects with traumatic brain injury (TBI). This study is a quantitative and qualitative acoustic analysis of AMR in seven subjects with TBI. The primary goal was to use acoustic analyses to assess speech motor control disturbances for the group as a whole and for individual patients. Quantitative analyses included measures of syllable rate, syllable and intersyllable gap durations, energy maxima, and voice onset time (VOT). Qualitative analyses included classification of features evident in spectrograms and waveforms to provide a more detailed description. The TBI group had (1) a slowed syllable rate due mostly to lengthened syllables and, to a lesser degree, lengthened intersyllable gaps, (2) highly correlated syllable rates between AMR and conversation, (3) temporal and energy maxima irregularities within repetition sequences, (4) normal median VOT values but with large variation, and (5) a number of speech production abnormalities revealed by qualitative analysis, including explosive speech quality, breathy voice quality, phonatory instability, multiple or missing stop bursts, continuous voicing, and spirantization. The relationships between these findings and TBI speakers' neurological status and dysarthria types are also discussed. It was concluded that acoustic analyses of the AMR task provides specific information on motor speech limitations in individuals with TBI.

  17. A Voice Enabled Procedure Browser for the International Space Station

    NASA Technical Reports Server (NTRS)

    Rayner, Manny; Chatzichrisafis, Nikos; Hockey, Beth Ann; Farrell, Kim; Renders, Jean-Michel

    2005-01-01

    Clarissa, an experimental voice enabled procedure browser that has recently been deployed on the International Space Station (ISS), is to the best of our knowledge the first spoken dialog system in space. This paper gives background on the system and the ISS procedures, then discusses the research developed to address three key problems: grammar-based speech recognition using the Regulus toolkit; SVM based methods for open microphone speech recognition; and robust side-effect free dialogue management for handling undos, corrections and confirmations.

  18. Human factors research problems in electronic voice warning system design

    NASA Technical Reports Server (NTRS)

    Simpson, C. A.; Williams, D. H.

    1975-01-01

    The speech messages issued by voice warning systems must be carefully designed in accordance with general principles of human decision making processes, human speech comprehension, and the conditions in which the warnings can occur. The operator's effectiveness must not be degraded by messages that are either inappropriate or difficult to comprehend. Important experimental variables include message content, linguistic redundancy, signal/noise ratio, interference with concurrent tasks, and listener expectations generated by the pragmatic or real world context in which the messages are presented.

  19. Evolution of crossmodal reorganization of the voice area in cochlear-implanted deaf patients.

    PubMed

    Rouger, Julien; Lagleyre, Sébastien; Démonet, Jean-François; Fraysse, Bernard; Deguine, Olivier; Barone, Pascal

    2012-08-01

    Psychophysical and neuroimaging studies in both animal and human subjects have clearly demonstrated that cortical plasticity following sensory deprivation leads to a brain functional reorganization that favors the spared modalities. In postlingually deaf patients, the use of a cochlear implant (CI) allows a recovery of the auditory function, which will probably counteract the cortical crossmodal reorganization induced by hearing loss. To study the dynamics of such reversed crossmodal plasticity, we designed a longitudinal neuroimaging study involving the follow-up of 10 postlingually deaf adult CI users engaged in a visual speechreading task. While speechreading activates Broca's area in normally hearing subjects (NHS), the activity level elicited in this region in CI patients is abnormally low and increases progressively with post-implantation time. Furthermore, speechreading in CI patients induces abnormal crossmodal activations in right anterior regions of the superior temporal cortex normally devoted to processing human voice stimuli (temporal voice-sensitive areas-TVA). These abnormal activity levels diminish with post-implantation time and tend towards the levels observed in NHS. First, our study revealed that the neuroplasticity after cochlear implantation involves not only auditory but also visual and audiovisual speech processing networks. Second, our results suggest that during deafness, the functional links between cortical regions specialized in face and voice processing are reallocated to support speech-related visual processing through cross-modal reorganization. Such reorganization allows a more efficient audiovisual integration of speech after cochlear implantation. These compensatory sensory strategies are later completed by the progressive restoration of the visuo-audio-motor speech processing loop, including Broca's area. Copyright © 2011 Wiley Periodicals, Inc.

  20. What does voice-processing technology support today?

    PubMed Central

    Nakatsu, R; Suzuki, Y

    1995-01-01

    This paper describes the state of the art in applications of voice-processing technologies. In the first part, technologies concerning the implementation of speech recognition and synthesis algorithms are described. Hardware technologies such as microprocessors and DSPs (digital signal processors) are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described. In the second part, the state of the art of algorithms from the standpoint of applications is discussed. Several issues concerning evaluation of speech recognition/synthesis algorithms are covered, as well as issues concerning the robustness of algorithms in adverse conditions. Images Fig. 3 PMID:7479720

  1. Overall intelligibility, articulation, resonance, voice and language in a child with Nager syndrome.

    PubMed

    Van Lierde, Kristiane M; Luyten, Anke; Mortier, Geert; Tijskens, Anouk; Bettens, Kim; Vermeersch, Hubert

    2011-02-01

    The purpose of this study was to provide a description of the language and speech (intelligibility, voice, resonance, articulation) in a 7-year-old Dutch speaking boy with Nager syndrome. To reveal these features comparison was made with an age and gender related child with a similar palatal or hearing problem. Language was tested with an age appropriate language test namely the Dutch version of the Clinical Evaluation of Language Fundamentals. Regarding articulation a phonetic inventory, phonetic analysis and phonological process analysis was performed. A nominal scale with four categories was used to judge the overall speech intelligibility. A voice and resonance assessment included a videolaryngostroboscopy, a perceptual evaluation, acoustic analysis and nasometry. The most striking communication problems in this child were expressive and receptive language delay, moderately impaired speech intelligibility, the presence of phonetic and phonological disorders, resonance disorders and a high-pitched voice. The explanation for this pattern of communication is not completely straightforward. The language and the phonological impairment, only present in the child with the Nager syndrome, are not part of a more general developmental delay. The resonance disorders can be related to the cleft palate, but were not present in the child with the isolated cleft palate. One might assume that the cul-de-sac resonance and the much decreased mandibular movement and the restricted tongue lifting are caused by the restricted jaw mobility and micrognathia. To what extent the suggested mandibular distraction osteogenesis in early childhood allows increased mandibular movement and better speech outcome with increased oral resonance is subject for further research. According to the results of this study the speech and language management must be focused on receptive and expressive language skills and linguistic conceptualization, correct phonetic placement and the modification of

  2. Inconsistency of speech in children with childhood apraxia of speech, phonological disorders, and typical speech

    NASA Astrophysics Data System (ADS)

    Iuzzini, Jenya

    There is a lack of agreement on the features used to differentiate Childhood Apraxia of Speech (CAS) from Phonological Disorders (PD). One criterion which has gained consensus is lexical inconsistency of speech (ASHA, 2007); however, no accepted measure of this feature has been defined. Although lexical assessment provides information about consistency of an item across repeated trials, it may not capture the magnitude of inconsistency within an item. In contrast, segmental analysis provides more extensive information about consistency of phoneme usage across multiple contexts and word-positions. The current research compared segmental and lexical inconsistency metrics in preschool-aged children with PD, CAS, and typical development (TD) to determine how inconsistency varies with age in typical and disordered speakers, and whether CAS and PD were differentiated equally well by both assessment levels. Whereas lexical and segmental analyses may be influenced by listener characteristics or speaker intelligibility, the acoustic signal is less vulnerable to these factors. In addition, the acoustic signal may reveal information which is not evident in the perceptual signal. A second focus of the current research was motivated by Blumstein et al.'s (1980) classic study on voice onset time (VOT) in adults with acquired apraxia of speech (AOS) which demonstrated a motor impairment underlying AOS. In the current study, VOT analyses were conducted to determine the relationship between age and group with the voicing distribution for bilabial and alveolar plosives. Findings revealed that 3-year-olds evidenced significantly higher inconsistency than 5-year-olds; segmental inconsistency approached 0% in 5-year-olds with TD, whereas it persisted in children with PD and CAS suggesting that for child in this age-range, inconsistency is a feature of speech disorder rather than typical development (Holm et al., 2007). Likewise, whereas segmental and lexical inconsistency were

  3. Associations between speech features and phenotypic severity in Treacher Collins syndrome

    PubMed Central

    2014-01-01

    Background Treacher Collins syndrome (TCS, OMIM 154500) is a rare congenital disorder of craniofacial development. Characteristic hypoplastic malformations of the ears, zygomatic arch, mandible and pharynx have been described in detail. However, reports on the impact of these malformations on speech are few. Exploring speech features and investigating if speech function is related to phenotypic severity are essential for optimizing follow-up and treatment. Methods Articulation, nasal resonance, voice and intelligibility were examined in 19 individuals (5–74 years, median 34 years) divided into three groups comprising children 5–10 years (n = 4), adolescents 11–18 years (n = 4) and adults 29 years and older (n = 11). A speech composite score (0–6) was calculated to reflect the variability of speech deviations. TCS severity scores of phenotypic expression and total scores of Nordic Orofacial Test-Screening (NOT-S) measuring orofacial dysfunction were used in analyses of correlation with speech characteristics (speech composite scores). Results Children and adolescents presented with significantly higher speech composite scores (median 4, range 1–6) than adults (median 1, range 0–5). Nearly all children and adolescents (6/8) displayed speech deviations of articulation, nasal resonance and voice, while only three adults were identified with multiple speech aberrations. The variability of speech dysfunction in TCS was exhibited by individual combinations of speech deviations in 13/19 participants. The speech composite scores correlated with TCS severity scores and NOT-S total scores. Speech composite scores higher than 4 were associated with cleft palate. The percent of intelligible words in connected speech was significantly lower in children and adolescents (median 77%, range 31–99) than in adults (98%, range 93–100). Intelligibility of speech among the children was markedly inconsistent and clearly affecting the understandability

  4. Associations between speech features and phenotypic severity in Treacher Collins syndrome.

    PubMed

    Asten, Pamela; Akre, Harriet; Persson, Christina

    2014-04-28

    Treacher Collins syndrome (TCS, OMIM 154500) is a rare congenital disorder of craniofacial development. Characteristic hypoplastic malformations of the ears, zygomatic arch, mandible and pharynx have been described in detail. However, reports on the impact of these malformations on speech are few. Exploring speech features and investigating if speech function is related to phenotypic severity are essential for optimizing follow-up and treatment. Articulation, nasal resonance, voice and intelligibility were examined in 19 individuals (5-74 years, median 34 years) divided into three groups comprising children 5-10 years (n = 4), adolescents 11-18 years (n = 4) and adults 29 years and older (n = 11). A speech composite score (0-6) was calculated to reflect the variability of speech deviations. TCS severity scores of phenotypic expression and total scores of Nordic Orofacial Test-Screening (NOT-S) measuring orofacial dysfunction were used in analyses of correlation with speech characteristics (speech composite scores). Children and adolescents presented with significantly higher speech composite scores (median 4, range 1-6) than adults (median 1, range 0-5). Nearly all children and adolescents (6/8) displayed speech deviations of articulation, nasal resonance and voice, while only three adults were identified with multiple speech aberrations. The variability of speech dysfunction in TCS was exhibited by individual combinations of speech deviations in 13/19 participants. The speech composite scores correlated with TCS severity scores and NOT-S total scores. Speech composite scores higher than 4 were associated with cleft palate. The percent of intelligible words in connected speech was significantly lower in children and adolescents (median 77%, range 31-99) than in adults (98%, range 93-100). Intelligibility of speech among the children was markedly inconsistent and clearly affecting the understandability. Multiple speech deviations were identified in

  5. Evaluation of the comprehension of noncontinuous sped-up vocoded speech - A strategy for coping with fading HF channels

    NASA Astrophysics Data System (ADS)

    Lynch, John T.

    1987-02-01

    The present technique for coping with fading and burst noise on HF channels used in digital voice communications transmits digital voice only during high S/N time intervals, and speeds up the speech when necessary to avoid conversation-hindering delays. On the basis of informal listening tests, four test conditions were selected in order to characterize those conditions of speech interruption which would render it comprehensible or incomprehensible. One of the test conditions, 2 secs on and 1/2-sec off, yielded test scores comparable to the reference continuous speech case and is a reasonable match to the temporal variations of a disturbed ionosphere.

  6. Tracking voice change after thyroidectomy: application of spectral/cepstral analyses.

    PubMed

    Awan, Shaheen N; Helou, Leah B; Stojadinovic, Alexander; Solomon, Nancy Pearl

    2011-04-01

    This study evaluates the utility of perioperative spectral and cepstral acoustic analyses to monitor voice change after thyroidectomy. Perceptual and acoustic analyses were conducted on speech samples (sustained vowel /α/ and CAPE-V sentences) provided by 70 participants (36 women and 34 men) at four study time points: prior to thyroid surgery and 2 weeks, 3 months and 6 months after thyroidectomy. Repeated measures analyses of variance focused on the relative amplitude of the dominant harmonic in the voice signal (cepstral peak prominence, CPP), the ratio of low-to-high spectral energy, and their respective standard deviations (SD). Data were also examined for relationships between acoustic measures and perceptual ratings of overall severity of voice quality. Results showed that perceived overall severity and the acoustic measures of the CPP and its SD (CPPsd) computed from sentence productions were significantly reduced at 2-week post-thyroidectomy for 20 patients (29% of the sample) who had self-reported post-operative voice change. For this same group of patients, the CPP and CPPsd computed from sentence productions improved significantly from 2-weeks post-thyroidectomy to 6-months post-surgery. CPP and CPPsd also correlated well with perceived overall severity (r = -0.68 and -0.79, respectively). Measures of CPP from sustained vowel productions were not as effective as those from sentence productions in reflecting voice deterioration in the post-thyroidectomy patients at the 2-week post-surgery time period, were weaker correlates with perceived overall severity, and were not as effective in discriminating negative voice outcome (NegVO) from normal voice outcome (NormVO) patients as compared to the results from the sentence-level stimuli. Results indicate that spectral/cepstral analysis methods can be used with continuous speech samples to provide important objective data to document the effects of dysphonia in a post-thyroidectomy patient sample. When used in

  7. Speech transformation system (spectrum and/or excitation) without pitch extraction

    NASA Astrophysics Data System (ADS)

    Seneff, S.

    1980-07-01

    A speech analysis synthesis system was developed which is capable of independent manipulation of the fundamental frequency and spectral envelope of a speech waveform. The system deconvolved the original speech with the spectral envelope estimate to obtain a model for the excitation, explicit pitch extraction was not required and as a consequence, the transformed speech was more natural sounding than would be the case if the excitation were modeled as a sequence of pulses. It is shown that the system has applications in the areas of voice modifications, baseband excited vocoders, time scale modifications, and frequency compression as an aid to the partially deaf.

  8. Influence of Telecommunication Modality, Internet Transmission Quality, and Accessories on Speech Perception in Cochlear Implant Users

    PubMed Central

    Koller, Roger; Guignard, Jérémie; Caversaccio, Marco; Kompis, Martin; Senn, Pascal

    2017-01-01

    Background Telecommunication is limited or even impossible for more than one-thirds of all cochlear implant (CI) users. Objective We sought therefore to study the impact of voice quality on speech perception with voice over Internet protocol (VoIP) under real and adverse network conditions. Methods Telephone speech perception was assessed in 19 CI users (15-69 years, average 42 years), using the German HSM (Hochmair-Schulz-Moser) sentence test comparing Skype and conventional telephone (public switched telephone networks, PSTN) transmission using a personal computer (PC) and a digital enhanced cordless telecommunications (DECT) telephone dual device. Five different Internet transmission quality modes and four accessories (PC speakers, headphones, 3.5 mm jack audio cable, and induction loop) were compared. As a secondary outcome, the subjective perceived voice quality was assessed using the mean opinion score (MOS). Results Speech telephone perception was significantly better (median 91.6%, P<.001) with Skype compared with PSTN (median 42.5%) under optimal conditions. Skype calls under adverse network conditions (data packet loss > 15%) were not superior to conventional telephony. In addition, there were no significant differences between the tested accessories (P>.05) using a PC. Coupling a Skype DECT phone device with an audio cable to the CI, however, resulted in higher speech perception (median 65%) and subjective MOS scores (3.2) than using PSTN (median 7.5%, P<.001). Conclusions Skype calls significantly improve speech perception for CI users compared with conventional telephony under real network conditions. Listening accessories do not further improve listening experience. Current Skype DECT telephone devices do not fully offer technical advantages in voice quality. PMID:28438727

  9. The Prevalence of Speech and Language Disorders in French-Speaking Preschool Children From Yaoundé (Cameroon).

    PubMed

    Tchoungui Oyono, Lilly; Pascoe, Michelle; Singh, Shajila

    2018-05-17

    The purpose of this study was to determine the prevalence of speech and language disorders in French-speaking preschool-age children in Yaoundé, the capital city of Cameroon. A total of 460 participants aged 3-5 years were recruited from the 7 communes of Yaoundé using a 2-stage cluster sampling method. Speech and language assessment was undertaken using a standardized speech and language test, the Evaluation du Langage Oral (Khomsi, 2001), which was purposefully renormed on the sample. A predetermined cutoff of 2 SDs below the normative mean was applied to identify articulation, expressive language, and receptive language disorders. Fluency and voice disorders were identified using clinical judgment by a speech-language pathologist. Overall prevalence was calculated as follows: speech disorders, 14.7%; language disorders, 4.3%; and speech and language disorders, 17.1%. In terms of disorders, prevalence findings were as follows: articulation disorders, 3.6%; expressive language disorders, 1.3%; receptive language disorders, 3%; fluency disorders, 8.4%; and voice disorders, 3.6%. Prevalence figures are higher than those reported for other countries and emphasize the urgent need to develop speech and language services for the Cameroonian population.

  10. Speech vs. singing: infants choose happier sounds

    PubMed Central

    Corbeil, Marieve; Trehub, Sandra E.; Peretz, Isabelle

    2013-01-01

    Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants' attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4–13 months of age were exposed to happy-sounding infant-directed speech vs. hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children's song spoken vs. sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children's song vs. a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing) was the principal contributor to infant attention, regardless of age. PMID:23805119

  11. Hoarseness in School-Aged Children and Effectiveness of Voice Therapy in International Classification of Functioning Framework.

    PubMed

    Akın Şenkal, Özgül; Özer, Cem

    2015-09-01

    The hoarseness in school-aged children disrupts the educational process because it affects the social progress, communication skills, and self-esteem of children. Besides otorhinolaryngological examination, the first treatment option is voice therapy when hoarseness occurs. The aim of the study was to determine the factors increasing the hoarseness in school-aged children by parental interview and to know preferable voice therapy on school-aged children within the frame of International Classification of Functioning (ICF). Retrospective analysis of data gathered from patient files. A total of 75 children (56 boys and 19 girls) were examined retrospectively. The age range of school-aged children is 7-14 years and average is 10.86 ± 2.51. A detailed history was taken from parents of children involved in this study. Information about vocal habits of children was gathered within the frame of ICF and then the voice therapies of children were started by scheduling appointments by an experienced speech-language pathologist. The differences between before and after voice therapy according to applied voice therapy methods, statistically significant differences were determined between maximum phonation time values and s/z rate. The relationship between voice therapy sessions and s/z rate with middle degree significance was found with physiological voice therapy sessions. According to ICF labels, most of voice complaints are matching with "body functions" and "activity and limitations." The appropriate voice therapy methods for hoarseness in school-aged children must be chosen and applied by speech-language therapists. The detailed history, which is received from family during the examination, within the frame of ICF affects the processes of choosing the voice therapy method and application of them positively. Child's family is very important for a successful management. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  12. Evaluation of participants' perception and taste thresholds with a zirconia palatal plate.

    PubMed

    Wada, Takeshi; Takano, Tomofumi; Tasaka, Akinori; Ueda, Takayuki; Sakurai, Kaoru

    2016-10-01

    Zirconia and cobalt-chromium can withstand a similar degree of loading. Therefore, using a zirconia base for removable dentures could allow the thickness of the palatal area to be reduced similarly to metal base dentures. We hypothesized that zirconia palatal plate for removable dentures provides a high level of participants' perception without influencing taste thresholds. The purpose of this study was to evaluate the participants' perception and taste thresholds of zirconia palatal plate. Palatal plates fabricated using acrylic resin, zirconia, and cobalt-chromium alloy were inserted into healthy individuals. Taste thresholds were investigated using the whole-mouth gustatory test, and participants' perception was evaluated using the 100-mm visual analog scale to assess the ease of pronunciation, ease of swallowing, sensation of temperature, metallic taste, sensation of foreign body, subjective sensory about weight, adhesiveness of chewing gum, and general satisfaction. For the taste thresholds, no significant differences were noted in sweet, salty, sour, bitter, or umami tastes among participants wearing no plate, or the resin, zirconia, and metal plates. Speech was easier and foreign body sensation was lower with the zirconia plate than with the resin plate. Evaluation of the adhesiveness of chewing gum showed that chewing gum does not readily adhere to the zirconia plate in comparison with the metal plate. The comprehensive participants' perception of the zirconia plate was evaluated as being superior to the resin plate. A zirconia palatal plate provides a high level of participants' perception without influencing taste thresholds. Copyright © 2016 Japan Prosthodontic Society. Published by Elsevier Ltd. All rights reserved.

  13. World Voice Day in news: analysis of reports on the Voice Campaign in Brazil.

    PubMed

    Dornelas, Rodrigo; Giannini, Susana Pimentel Pinto; Ferreira, Léslie Piccolotto

    2015-01-01

    To analyze the television reports on the World Voice Day transmitted by Globo(r) TV. We researched television reports broadcasted by Globo(r) Network in regional television news programs from March 15 to April 20, 2013. For the data analysis, the Document Analysis technique was used. The analyzed variables were the following: location, broadcasting period, duration, interviewed professional, mention of multiprofessional work, orientation to the population, and the interview approach (health promotion or disease prevention). Through statistical analysis, the interview approach was considered the outcome and associated with the other variables. On the regions where there are news programs for the researched TV station, the majority made reports about the Voice Campaign. Among these, we discovered that the five regions of Brazil were contemplated, in the morning/afternoon periods, with medium duration of 5.3 minutes. The presence of the speech-language pathologist was observed in greater numbers of the interviews, as also the emphasis on the importance of a multiprofessional work. Regarding the content presented, the interviewees focused on diseases caused by habits that impair the voice, with orientation to the public about what negatively interferes in the vocal well-being. The approach of the interviews was not, in the majority of times, of the same nature (promoting the vocal well-being or preventing voice disorder), and the interprofessional practice is still seen less frequently as a possible work strategy.

  14. Dysphagia, Speech, Voice, and Trismus following Radiotherapy and/or Chemotherapy in Patients with Head and Neck Carcinoma: Review of the Literature

    PubMed Central

    Koetsenruijter, K. W. J.; Swan, K.; Bogaardt, H.

    2016-01-01

    Introduction. Patients with head and neck cancer suffer from various impairments due to the primary illness, as well as secondary consequences of the oncological treatment. This systematic review describes the effects of radiotherapy and/or chemotherapy on the functions of the upper aerodigestive tract in patients with head and neck cancer. Methods. A systematic literature search was performed by two independent reviewers using the electronic databases PubMed and Embase. All dates up to May 2016 were included. Results. Of the 947 abstracts, sixty articles met the inclusion criteria and described one or more aspects of the sequelae of radiotherapy and/or chemotherapy. Forty studies described swallowing-related problems, 24 described voice-related problems, seven described trismus, and 25 studies described general quality of life. Only 14 articles reported that speech pathologists conducted the interventions, of which only six articles described in detail what the interventions involved. Conclusion. In general, voice quality improved following intervention, whereas quality of life, dysphagia, and oral intake deteriorated during and after treatment. However, as a consequence of the diversity in treatment protocols and patient characteristics, the conclusions of most studies cannot be easily generalised. Further research on the effects of oncological interventions on the upper aerodigestive tract is needed. PMID:27722170

  15. The effects of gated speech on the fluency of speakers who stutter

    PubMed Central

    Howell, Peter

    2007-01-01

    It is known that the speech of people who stutter improves when the speaker’s own vocalization is changed while the participant is speaking. One explanation of these effects is the disruptive rhythm hypothesis (DRH). DRH maintains that the manipulated sound only needs to disturb timing to affect speech control. The experiment investigated whether speech that was gated on and off (interrupted) affected the speech control of speakers who stutter. Eight children who stutter read a passage when they heard their voice normally and when the speech was gated. Fluency was enhanced (fewer errors were made and time to read a set passage was reduced) when speech was interrupted in this way. The results support the DRH. PMID:17726328

  16. Assessment of breathing patterns and respiratory muscle recruitment during singing and speech in quadriplegia.

    PubMed

    Tamplin, Jeanette; Brazzale, Danny J; Pretto, Jeffrey J; Ruehland, Warren R; Buttifant, Mary; Brown, Douglas J; Berlowitz, David J

    2011-02-01

    To explore how respiratory impairment after cervical spinal cord injury affects vocal function, and to explore muscle recruitment strategies used during vocal tasks after quadriplegia. It was hypothesized that to achieve the increased respiratory support required for singing and loud speech, people with quadriplegia use different patterns of muscle recruitment and control strategies compared with control subjects without spinal cord injury. Matched, parallel-group design. Large university-affiliated public hospital. Consenting participants with motor-complete C5-7 quadriplegia (n=6) and able-bodied age-matched controls (n=6) were assessed on physiologic and voice measures during vocal tasks. Not applicable. Standard respiratory function testing, surface electromyographic activity from accessory respiratory muscles, sound pressure levels during vocal tasks, the Voice Handicap Index, and the Perceptual Voice Profile. The group with quadriplegia had a reduced lung capacity (vital capacity, 71% vs 102% of predicted; P=.028), more perceived voice problems (Voice Handicap Index score, 22.5 vs 6.5; P=.046), and greater recruitment of accessory respiratory muscles during both loud and soft volumes (P=.028) than the able-bodied controls. The group with quadriplegia also demonstrated higher accessory muscle activation in changing from soft to loud speech (P=.028). People with quadriplegia have impaired vocal ability and use different muscle recruitment strategies during speech than the able-bodied. These findings will enable us to target specific measurements of respiratory physiology for assessing functional improvements in response to formal therapeutic singing training. Copyright © 2011 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  17. [The role of sex in voice restoration and emotional functioning after laryngectomy].

    PubMed

    Keszte, J; Wollbrück, D; Meyer, A; Fuchs, M; Meister, E; Pabst, F; Oeken, J; Schock, J; Wulke, C; Singer, S

    2012-04-01

    Data on psychosocial factors of laryngectomized women is rare. All means of alaryngeal voice production sound male due to low fundamental frequency and roughness, which makes postlaryngectomy voice rehabilitation especially challenging to women. Aim of this study was to investigate whether women use alaryngeal speech more seldomly and therefore are more emotionally distressed. In a cross-sectional multi-centred study 12 female and 138 male laryngectomees were interviewed. To identify risc factors on seldom use of alaryngeal speech and emotional functioning, logistic regression was used and odds ratios were adjusted to age, time since laryngectomy, physical functioning, social activity and feelings of stigmatization. Esophageal speech was used by 83% of the female and 57% of the male patients, prosthetic speech was used by 17% of the female and 20% of the male patients and electrolaryngeal speech was used by 17% of the female and 29% of the male patients. There was a higher risk for laryngectomees to be more emotionally distressed when feeling physically bad (OR=2,48; p=0,02) or having feelings of stigmatization (OR=3,94; p≤0,00). Besides more women tended to be socially active than men (83% vs. 54%; p=0,05). There was no influence of sex neither on use of alaryngeal speech nor on emotional functioning. Since there is evidence for a different psychosocial adjustment in laryngectomized men and women, more investigation including bigger sample sizes will be needed on this special issue. © Georg Thieme Verlag KG Stuttgart · New York.

  18. Functional Overlap between Regions Involved in Speech Perception and in Monitoring One's Own Voice during Speech Production

    ERIC Educational Resources Information Center

    Zheng, Zane Z.; Munhall, Kevin G.; Johnsrude, Ingrid S.

    2010-01-01

    The fluency and the reliability of speech production suggest a mechanism that links motor commands and sensory feedback. Here, we examined the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or…

  19. Speech recovery device

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Frankle, Christen M.

    2000-10-19

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assistedmore » person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.« less

  20. Voice recognition through phonetic features with Punjabi utterances

    NASA Astrophysics Data System (ADS)

    Kaur, Jasdeep; Juglan, K. C.; Sharma, Vishal; Upadhyay, R. K.

    2017-07-01

    This paper deals with perception and disorders of speech in view of Punjabi language. Visualizing the importance of voice identification, various parameters of speaker identification has been studied. The speech material was recorded with a tape recorder in their normal and disguised mode of utterances. Out of the recorded speech materials, the utterances free from noise, etc were selected for their auditory and acoustic spectrographic analysis. The comparison of normal and disguised speech of seven subjects is reported. The fundamental frequency (F0) at similar places, Plosive duration at certain phoneme, Amplitude ratio (A1:A2) etc. were compared in normal and disguised speech. It was found that the formant frequency of normal and disguised speech remains almost similar only if it is compared at the position of same vowel quality and quantity. If the vowel is more closed or more open in the disguised utterance the formant frequency will be changed in comparison to normal utterance. The ratio of the amplitude (A1: A2) is found to be speaker dependent. It remains unchanged in the disguised utterance. However, this value may shift in disguised utterance if cross sectioning is not done at the same location.

  1. Electrophysiological and hemodynamic mismatch responses in rats listening to human speech syllables.

    PubMed

    Mahmoudzadeh, Mahdi; Dehaene-Lambertz, Ghislaine; Wallois, Fabrice

    2017-01-01

    Speech is a complex auditory stimulus which is processed according to several time-scales. Whereas consonant discrimination is required to resolve rapid acoustic events, voice perception relies on slower cues. Humans, right from preterm ages, are particularly efficient to encode temporal cues. To compare the capacities of preterms to those observed in other mammals, we tested anesthetized adult rats by using exactly the same paradigm as that used in preterm neonates. We simultaneously recorded neural (using ECoG) and hemodynamic responses (using fNIRS) to series of human speech syllables and investigated the brain response to a change of consonant (ba vs. ga) and to a change of voice (male vs. female). Both methods revealed concordant results, although ECoG measures were more sensitive than fNIRS. Responses to syllables were bilateral, but with marked right-hemispheric lateralization. Responses to voice changes were observed with both methods, while only ECoG was sensitive to consonant changes. These results suggest that rats more effectively processed the speech envelope than fine temporal cues in contrast with human preterm neonates, in whom the opposite effects were observed. Cross-species comparisons constitute a very valuable tool to define the singularities of the human brain and species-specific bias that may help human infants to learn their native language.

  2. Research in speech communication.

    PubMed Central

    Flanagan, J

    1995-01-01

    Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker. Images Fig. 1 Fig. 2 Fig. 5 Fig. 8 Fig. 11 Fig. 12 Fig. 13 PMID:7479806

  3. Attentional modulation of informational masking on early cortical representations of speech signals.

    PubMed

    Zhang, Changxin; Arnott, Stephen R; Rabaglia, Cristina; Avivi-Reich, Meital; Qi, James; Wu, Xihong; Li, Liang; Schneider, Bruce A

    2016-01-01

    To recognize speech in a noisy auditory scene, listeners need to perceptually segregate the target talker's voice from other competing sounds (stream segregation). A number of studies have suggested that the attentional demands placed on listeners increase as the acoustic properties and informational content of the competing sounds become more similar to that of the target voice. Hence we would expect attentional demands to be considerably greater when speech is masked by speech than when it is masked by steady-state noise. To investigate the role of attentional mechanisms in the unmasking of speech sounds, event-related potentials (ERPs) were recorded to a syllable masked by noise or competing speech under both active (the participant was asked to respond when the syllable was presented) or passive (no response was required) listening conditions. The results showed that the long-latency auditory response to a syllable (/bi/), presented at different signal-to-masker ratios (SMRs), was similar in both passive and active listening conditions, when the masker was a steady-state noise. In contrast, a switch from the passive listening condition to the active one, when the masker was two-talker speech, significantly enhanced the ERPs to the syllable. These results support the hypothesis that the need to engage attentional mechanisms in aid of scene analysis increases as the similarity (both acoustic and informational) between the target speech and the competing background sounds increases. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Andreas Vesalius' 500th Anniversary: Initial Integral Understanding of Voice Production.

    PubMed

    Brinkman, Romy J; Hage, J Joris

    2017-01-01

    Voice production relies on the integrated functioning of a three-part system: respiration, phonation and resonance, and articulation. To commemorate the 500th anniversary of the great anatomist Andreas Vesalius (1515-1564), we report on his understanding of this integral system. The text of Vesalius' masterpiece De Humani Corporis Fabrica Libri Septum and an eyewitness report of the public dissection of three corpses by Vesalius in Bologna, Italy, in 1540, were searched for references to the voice-producing anatomical structures and their function. We clustered the traced, separate parts for the first time. We found that Vesalius recognized the importance for voice production of many details of the respiratory system, the voice box, and various structures of resonance and articulation. He stressed that voice production was a cerebral function and extensively recorded the innervation of the voice-producing organs by the cranial nerves. Vesalius was the first to publicly record the concept of voice production as an integrated and cerebrally directed function of respiration, phonation and resonance, and articulation. In doing so nearly 500 years ago, he laid a firm basis for the understanding of the physiology of voice production and speech and its management as we know it today. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  5. Can blind persons accurately assess body size from the voice?

    PubMed

    Pisanski, Katarzyna; Oleszkiewicz, Anna; Sorokowska, Agnieszka

    2016-04-01

    Vocal tract resonances provide reliable information about a speaker's body size that human listeners use for biosocial judgements as well as speech recognition. Although humans can accurately assess men's relative body size from the voice alone, how this ability is acquired remains unknown. In this study, we test the prediction that accurate voice-based size estimation is possible without prior audiovisual experience linking low frequencies to large bodies. Ninety-one healthy congenitally or early blind, late blind and sighted adults (aged 20-65) participated in the study. On the basis of vowel sounds alone, participants assessed the relative body sizes of male pairs of varying heights. Accuracy of voice-based body size assessments significantly exceeded chance and did not differ among participants who were sighted, or congenitally blind or who had lost their sight later in life. Accuracy increased significantly with relative differences in physical height between men, suggesting that both blind and sighted participants used reliable vocal cues to size (i.e. vocal tract resonances). Our findings demonstrate that prior visual experience is not necessary for accurate body size estimation. This capacity, integral to both nonverbal communication and speech perception, may be present at birth or may generalize from broader cross-modal correspondences. © 2016 The Author(s).

  6. Can blind persons accurately assess body size from the voice?

    PubMed Central

    Oleszkiewicz, Anna; Sorokowska, Agnieszka

    2016-01-01

    Vocal tract resonances provide reliable information about a speaker's body size that human listeners use for biosocial judgements as well as speech recognition. Although humans can accurately assess men's relative body size from the voice alone, how this ability is acquired remains unknown. In this study, we test the prediction that accurate voice-based size estimation is possible without prior audiovisual experience linking low frequencies to large bodies. Ninety-one healthy congenitally or early blind, late blind and sighted adults (aged 20–65) participated in the study. On the basis of vowel sounds alone, participants assessed the relative body sizes of male pairs of varying heights. Accuracy of voice-based body size assessments significantly exceeded chance and did not differ among participants who were sighted, or congenitally blind or who had lost their sight later in life. Accuracy increased significantly with relative differences in physical height between men, suggesting that both blind and sighted participants used reliable vocal cues to size (i.e. vocal tract resonances). Our findings demonstrate that prior visual experience is not necessary for accurate body size estimation. This capacity, integral to both nonverbal communication and speech perception, may be present at birth or may generalize from broader cross-modal correspondences. PMID:27095264

  7. Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback.

    PubMed

    Behroozmand, Roozbeh; Larson, Charles R

    2011-06-06

    The motor-driven predictions about expected sensory feedback (efference copies) have been proposed to play an important role in recognition of sensory consequences of self-produced motor actions. In the auditory system, this effect was suggested to result in suppression of sensory neural responses to self-produced voices that are predicted by the efference copies during vocal production in comparison with passive listening to the playback of the identical self-vocalizations. In the present study, event-related potentials (ERPs) were recorded in response to upward pitch shift stimuli (PSS) with five different magnitudes (0, +50, +100, +200 and +400 cents) at voice onset during active vocal production and passive listening to the playback. Results indicated that the suppression of the N1 component during vocal production was largest for unaltered voice feedback (PSS: 0 cents), became smaller as the magnitude of PSS increased to 200 cents, and was almost completely eliminated in response to 400 cents stimuli. Findings of the present study suggest that the brain utilizes the motor predictions (efference copies) to determine the source of incoming stimuli and maximally suppresses the auditory responses to unaltered feedback of self-vocalizations. The reduction of suppression for 50, 100 and 200 cents and its elimination for 400 cents pitch-shifted voice auditory feedback support the idea that motor-driven suppression of voice feedback leads to distinctly different sensory neural processing of self vs. non-self vocalizations. This characteristic may enable the audio-vocal system to more effectively detect and correct for unexpected errors in the feedback of self-produced voice pitch compared with externally-generated sounds.

  8. Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback

    PubMed Central

    2011-01-01

    Background The motor-driven predictions about expected sensory feedback (efference copies) have been proposed to play an important role in recognition of sensory consequences of self-produced motor actions. In the auditory system, this effect was suggested to result in suppression of sensory neural responses to self-produced voices that are predicted by the efference copies during vocal production in comparison with passive listening to the playback of the identical self-vocalizations. In the present study, event-related potentials (ERPs) were recorded in response to upward pitch shift stimuli (PSS) with five different magnitudes (0, +50, +100, +200 and +400 cents) at voice onset during active vocal production and passive listening to the playback. Results Results indicated that the suppression of the N1 component during vocal production was largest for unaltered voice feedback (PSS: 0 cents), became smaller as the magnitude of PSS increased to 200 cents, and was almost completely eliminated in response to 400 cents stimuli. Conclusions Findings of the present study suggest that the brain utilizes the motor predictions (efference copies) to determine the source of incoming stimuli and maximally suppresses the auditory responses to unaltered feedback of self-vocalizations. The reduction of suppression for 50, 100 and 200 cents and its elimination for 400 cents pitch-shifted voice auditory feedback support the idea that motor-driven suppression of voice feedback leads to distinctly different sensory neural processing of self vs. non-self vocalizations. This characteristic may enable the audio-vocal system to more effectively detect and correct for unexpected errors in the feedback of self-produced voice pitch compared with externally-generated sounds. PMID:21645406

  9. Speech disorders in Israeli Arab children.

    PubMed

    Jaber, L; Nahmani, A; Shohat, M

    1997-10-01

    The aim of this work was to study the frequency of speech disorders in Israeli Arab children and its association with parental consanguinity. A questionnaire was sent to the parents of 1,495 Arab children attending kindergarten and the first two grades of the seven primary schools in the town of Taibe. Eight-six percent (1,282 parents) responded. The answers to the questionnaire revealed that 25% of the children reportedly had a speech and language disorder. Of the children identified by their parents as having a speech disorder, 44 were selected randomly for examination by a speech specialist. The disorders noted in this subgroup included errors in articulation (48.0%), poor language (18%), poor voice quality (15.9%); stuttering (13.6%), and other problems (4.5%). Rates of affected children of consanguineous and non-consanguineous marriages were 31% and 22.4%, respectively (p < 0.01). We conclude that speech disorders are an important problem among Israeli Arab schoolchildren. More comprehensive programs are needed to facilitate diagnosis and treatment.

  10. Acoustic analysis of speech variables during depression and after improvement.

    PubMed

    Nilsonne, A

    1987-09-01

    Speech recordings were made of 16 depressed patients during depression and after clinical improvement. The recordings were analyzed using a computer program which extracts acoustic parameters from the fundamental frequency contour of the voice. The percent pause time, the standard deviation of the voice fundamental frequency distribution, the standard deviation of the rate of change of the voice fundamental frequency and the average speed of voice change were found to correlate to the clinical state of the patient. The mean fundamental frequency, the total reading time and the average rate of change of the voice fundamental frequency did not differ between the depressed and the improved group. The acoustic measures were more strongly correlated to the clinical state of the patient as measured by global depression scores than to single depressive symptoms such as retardation or agitation.

  11. Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study.

    PubMed

    Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola

    2015-11-06

    Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented.

  12. Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study

    PubMed Central

    Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola

    2015-01-01

    Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented. PMID:26561811

  13. Sex and the singer: Gender categorization aspects of singing voice

    NASA Astrophysics Data System (ADS)

    Ternström, Sten

    2003-04-01

    The singing voice exhibits many systematic differences by gender and age. The physiological differences between the voice organs of males, females, and children are well known and give rise to several acoustic differences, including acoustic power, pitch range, and spectral distribution. Vocal artists often strive to widen their range of expression, and it is not uncommon for males to sing in a femalelike register, as in counter tenors and in some pop/rock genres. The opposite, however, is quite rare. While ambiguous or contradictory gender in speech is usually a social disadvantage, in singing it can be a desired effect. The physical differences in singing voice production between males and females are reviewed in detail. Some interesting borderline cases are examined from an acoustic standpoint.

  14. Lower Vocal Tract Morphologic Adjustments Are Relevant for Voice Timbre in Singing.

    PubMed

    Mainka, Alexander; Poznyakovskiy, Anton; Platzek, Ivan; Fleischer, Mario; Sundberg, Johan; Mürbe, Dirk

    2015-01-01

    The vocal tract shape is crucial to voice production. Its lower part seems particularly relevant for voice timbre. This study analyzes the detailed morphology of parts of the epilaryngeal tube and the hypopharynx for the sustained German vowels /a/, /e/, /i/, /o/, and /u/ by thirteen male singer subjects who were at the beginning of their academic singing studies. Analysis was based on two different phonatory conditions: a natural, speech-like phonation and a singing phonation, like in classical singing. 3D models of the vocal tract were derived from magnetic resonance imaging and compared with long-term average spectrum analysis of audio recordings from the same subjects. Comparison of singing to the speech-like phonation, which served as reference, showed significant adjustments of the lower vocal tract: an average lowering of the larynx by 8 mm and an increase of the hypopharyngeal cross-sectional area (+ 21:9%) and volume (+ 16:8%). Changes in the analyzed epilaryngeal portion of the vocal tract were not significant. Consequently, lower larynx-to-hypopharynx area and volume ratios were found in singing compared to the speech-like phonation. All evaluated measures of the lower vocal tract varied significantly with vowel quality. Acoustically, an increase of high frequency energy in singing correlated with a wider hypopharyngeal area. The findings offer an explanation how classical male singers might succeed in producing a voice timbre with increased high frequency energy, creating a singer`s formant cluster.

  15. Lower Vocal Tract Morphologic Adjustments Are Relevant for Voice Timbre in Singing

    PubMed Central

    Mainka, Alexander; Poznyakovskiy, Anton; Platzek, Ivan; Fleischer, Mario; Sundberg, Johan; Mürbe, Dirk

    2015-01-01

    The vocal tract shape is crucial to voice production. Its lower part seems particularly relevant for voice timbre. This study analyzes the detailed morphology of parts of the epilaryngeal tube and the hypopharynx for the sustained German vowels /a/, /e/, /i/, /o/, and /u/ by thirteen male singer subjects who were at the beginning of their academic singing studies. Analysis was based on two different phonatory conditions: a natural, speech-like phonation and a singing phonation, like in classical singing. 3D models of the vocal tract were derived from magnetic resonance imaging and compared with long-term average spectrum analysis of audio recordings from the same subjects. Comparison of singing to the speech-like phonation, which served as reference, showed significant adjustments of the lower vocal tract: an average lowering of the larynx by 8 mm and an increase of the hypopharyngeal cross-sectional area (+ 21.9%) and volume (+ 16.8%). Changes in the analyzed epilaryngeal portion of the vocal tract were not significant. Consequently, lower larynx-to-hypopharynx area and volume ratios were found in singing compared to the speech-like phonation. All evaluated measures of the lower vocal tract varied significantly with vowel quality. Acoustically, an increase of high frequency energy in singing correlated with a wider hypopharyngeal area. The findings offer an explanation how classical male singers might succeed in producing a voice timbre with increased high frequency energy, creating a singer‘s formant cluster. PMID:26186691

  16. A taste for ATP: neurotransmission in taste buds

    PubMed Central

    Kinnamon, Sue C.; Finger, Thomas E.

    2013-01-01

    Not only is ATP a ubiquitous source of energy but it is also used widely as an intercellular signal. For example, keratinocytes release ATP in response to numerous external stimuli including pressure, heat, and chemical insult. The released ATP activates purinergic receptors on nerve fibers to generate nociceptive signals. The importance of an ATP signal in epithelial-to-neuronal signaling is nowhere more evident than in the taste system. The receptor cells of taste buds release ATP in response to appropriate stimulation by tastants and the released ATP then activates P2X2 and P2X3 receptors on the taste nerves. Genetic ablation of the relevant P2X receptors leaves an animal without the ability to taste any primary taste quality. Of interest is that release of ATP by taste receptor cells occurs in a non-vesicular fashion, apparently via gated membrane channels. Further, in keeping with the crucial role of ATP as a neurotransmitter in this system, a subset of taste cells expresses a specific ectoATPase, NTPDase2, necessary to clear extracellular ATP which otherwise will desensitize the P2X receptors on the taste nerves. The unique utilization of ATP as a key neurotransmitter in the taste system may reflect the epithelial rather than neuronal origins of the receptor cells. PMID:24385952

  17. The stability of locus equation slopes across stop consonant voicing/aspiration

    NASA Astrophysics Data System (ADS)

    Sussman, Harvey M.; Modarresi, Golnaz

    2004-05-01

    The consistency of locus equation slopes as phonetic descriptors of stop place in CV sequences across voiced and voiceless aspirated stops was explored in the speech of five male speakers of American English and two male speakers of Persian. Using traditional locus equation measurement sites for F2 onsets, voiceless labial and coronal stops had significantly lower locus equation slopes relative to their voiced counterparts, whereas velars failed to show voicing differences. When locus equations were derived using F2 onsets for voiced stops that were measured closer to the stop release burst, comparable to the protocol for measuring voiceless aspirated stops, no significant effects of voicing/aspiration on locus equation slopes were observed. This methodological factor, rather than an underlying phonetic-based explanation, provides a reasonable account for the observed flatter locus equation slopes of voiceless labial and coronal stops relative to voiced cognates reported in previous studies [Molis et al., J. Acoust. Soc. Am. 95, 2925 (1994); O. Engstrand and B. Lindblom, PHONUM 4, 101-104]. [Work supported by NIH.

  18. Speech recognition systems on the Cell Broadband Engine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Y; Jones, H; Vaidya, S

    In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousandsmore » of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.« less

  19. Relationship Between Voice and Motor Disabilities of Parkinson's Disease.

    PubMed

    Majdinasab, Fatemeh; Karkheiran, Siamak; Soltani, Majid; Moradi, Negin; Shahidi, Gholamali

    2016-11-01

    To evaluate voice of Iranian patients with Parkinson's disease (PD) and find any relationship between motor disabilities and acoustic voice parameters as speech motor components. We evaluated 27 Farsi-speaking PD patients and 21 age- and sex-matched healthy persons as control. Motor performance was assessed by the Unified Parkinson's Disease Rating Scale part III and Hoehn and Yahr rating scale in the "on" state. Acoustic voice evaluation, including fundamental frequency (f0), standard deviation of f0, minimum of f0, maximum of f0, shimmer, jitter, and harmonic to noise ratio, was done using the Praat software via /a/ prolongation. No difference was seen between the voice of the patients and the voice of the controls. f0 and its variation had a significant correlation with the duration of the disease, but did not have any relationships with the Unified Parkinson's Disease Rating Scale part III. Only limited relationship was observed between voice and motor disabilities. Tremor is an important main feature of PD that affects motor and phonation systems. Females had an older age at onset, more prolonged disease, and more severe motor disabilities (not statistically significant), but phonation disorders were more frequent in males and showed more relationship with severity of motor disabilities. Voice is affected by PD earlier than many other motor components and is more sensitive to disease progression. Tremor is the most effective part of PD that impacts voice. PD has more effect on voice of male versus female patients. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  20. The Effect of Background Traffic Packet Size to VoIP Speech Quality

    NASA Astrophysics Data System (ADS)

    Triyason, Tuul; Kanthamanon, Prasert; Warasup, Kittipong; Yamsaengsung, Siam; Supattatham, Montri

    VoIP is gaining acceptance into the corporate world especially, in small and medium sized business that want to save cost for gaining advantage over their competitors. The good voice quality is one of challenging task in deployment plan because VoIP voice quality was affected by packet loss and jitter delay. In this paper, we study the effect of background traffic packet size to voice quality. The background traffic was generated by Bricks software and the speech quality was assessed by MOS. The obtained result shows an interesting relationship between the voice quality and the number of TCP packets and their size. With the same amount of data smaller packets affect the voice's quality more than the larger packet.

  1. Effects of vocal training on singing and speaking voice characteristics in vocally healthy adults and children based on choral and nonchoral data.

    PubMed

    Siupsinskiene, Nora; Lycke, Hugo

    2011-07-01

    This prospective cross-sectional study examines the effects of voice training on vocal capabilities in vocally healthy age and gender differentiated groups measured by voice range profile (VRP) and speech range profile (SRP). Frequency and intensity measurements of the VRP and SRP using standard singing and speaking voice protocols were derived from 161 trained choir singers (21 males, 59 females, and 81 prepubescent children) and from 188 nonsingers (38 males, 89 females, and 61 children). When compared with nonsingers, both genders of trained adult and child singers exhibited increased mean pitch range, highest frequency, and VRP area in high frequencies (P<0.05). Female singers and child singers also showed significantly increased mean maximum voice intensity, intensity range, and total VRP area. The logistic regression analysis showed that VRP pitch range, highest frequency, maximum voice intensity, and maximum-minimum intensity range, and SRP slope of speaking curve were the key predictors of voice training. Age, gender, and voice training differentiated norms of VRP and SRP parameters are presented. Significant positive effect of voice training on vocal capabilities, mostly singing voice, was confirmed. The presented norms for trained singers, with key parameters differentiated by gender and age, are suggested for clinical practice of otolaryngologists and speech-language pathologists. Copyright © 2011 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  2. Speech–Language Pathology Evaluation and Management of Hyperkinetic Disorders Affecting Speech and Swallowing Function

    PubMed Central

    Barkmeier-Kraemer, Julie M.; Clark, Heather M.

    2017-01-01

    Background Hyperkinetic dysarthria is characterized by abnormal involuntary movements affecting respiratory, phonatory, and articulatory structures impacting speech and deglutition. Speech–language pathologists (SLPs) play an important role in the evaluation and management of dysarthria and dysphagia. This review describes the standard clinical evaluation and treatment approaches by SLPs for addressing impaired speech and deglutition in specific hyperkinetic dysarthria populations. Methods A literature review was conducted using the data sources of PubMed, Cochrane Library, and Google Scholar. Search terms included 1) hyperkinetic dysarthria, essential voice tremor, voice tremor, vocal tremor, spasmodic dysphonia, spastic dysphonia, oromandibular dystonia, Meige syndrome, orofacial, cervical dystonia, dystonia, dyskinesia, chorea, Huntington’s Disease, myoclonus; and evaluation/treatment terms: 2) Speech–Language Pathology, Speech Pathology, Evaluation, Assessment, Dysphagia, Swallowing, Treatment, Management, and diagnosis. Results The standard SLP clinical speech and swallowing evaluation of chorea/Huntington’s disease, myoclonus, focal and segmental dystonia, and essential vocal tremor typically includes 1) case history; 2) examination of the tone, symmetry, and sensorimotor function of the speech structures during non-speech, speech and swallowing relevant activities (i.e., cranial nerve assessment); 3) evaluation of speech characteristics; and 4) patient self-report of the impact of their disorder on activities of daily living. SLP management of individuals with hyperkinetic dysarthria includes behavioral and compensatory strategies for addressing compromised speech and intelligibility. Swallowing disorders are managed based on individual symptoms and the underlying pathophysiology determined during evaluation. Discussion SLPs play an important role in contributing to the differential diagnosis and management of impaired speech and deglutition

  3. Experience with speech sounds is not necessary for cue trading by budgerigars (Melopsittacus undulatus).

    PubMed

    Flaherty, Mary; Dent, Micheal L; Sawusch, James R

    2017-01-01

    The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.

  4. Experience with speech sounds is not necessary for cue trading by budgerigars (Melopsittacus undulatus)

    PubMed Central

    Flaherty, Mary; Dent, Micheal L.; Sawusch, James R.

    2017-01-01

    The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with “d” or “t” and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal. PMID:28562597

  5. Central voice production and pathophysiology of spasmodic dysphonia.

    PubMed

    Mor, Niv; Simonyan, Kristina; Blitzer, Andrew

    2018-01-01

    Our ability to speak is complex, and the role of the central nervous system in controlling speech production is often overlooked in the field of otolaryngology. In this brief review, we present an integrated overview of speech production with a focus on the role of central nervous system. The role of central control of voice production is then further discussed in relation to the potential pathophysiology of spasmodic dysphonia (SD). Peer-review articles on central laryngeal control and SD were identified from PUBMED search. Selected articles were augmented with designated relevant publications. Publications that discussed central and peripheral nervous system control of voice production and the central pathophysiology of laryngeal dystonia were chosen. Our ability to speak is regulated by specialized complex mechanisms coordinated by high-level cortical signaling, brainstem reflexes, peripheral nerves, muscles, and mucosal actions. Recent studies suggest that SD results from a primary central disturbance associated with dysfunction at our highest levels of central voice control. The efficacy of botulinum toxin in treating SD may not be limited solely to its local effect on laryngeal muscles and also may modulate the disorder at the level of the central nervous system. Future therapeutic options that target the central nervous system may help modulate the underlying disorder in SD and allow clinicians to better understand the principal pathophysiology. NA.Laryngoscope, 128:177-183, 2018. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.

  6. Influence of Telecommunication Modality, Internet Transmission Quality, and Accessories on Speech Perception in Cochlear Implant Users.

    PubMed

    Mantokoudis, Georgios; Koller, Roger; Guignard, Jérémie; Caversaccio, Marco; Kompis, Martin; Senn, Pascal

    2017-04-24

    Telecommunication is limited or even impossible for more than one-thirds of all cochlear implant (CI) users. We sought therefore to study the impact of voice quality on speech perception with voice over Internet protocol (VoIP) under real and adverse network conditions. Telephone speech perception was assessed in 19 CI users (15-69 years, average 42 years), using the German HSM (Hochmair-Schulz-Moser) sentence test comparing Skype and conventional telephone (public switched telephone networks, PSTN) transmission using a personal computer (PC) and a digital enhanced cordless telecommunications (DECT) telephone dual device. Five different Internet transmission quality modes and four accessories (PC speakers, headphones, 3.5 mm jack audio cable, and induction loop) were compared. As a secondary outcome, the subjective perceived voice quality was assessed using the mean opinion score (MOS). Speech telephone perception was significantly better (median 91.6%, P<.001) with Skype compared with PSTN (median 42.5%) under optimal conditions. Skype calls under adverse network conditions (data packet loss > 15%) were not superior to conventional telephony. In addition, there were no significant differences between the tested accessories (P>.05) using a PC. Coupling a Skype DECT phone device with an audio cable to the CI, however, resulted in higher speech perception (median 65%) and subjective MOS scores (3.2) than using PSTN (median 7.5%, P<.001). Skype calls significantly improve speech perception for CI users compared with conventional telephony under real network conditions. Listening accessories do not further improve listening experience. Current Skype DECT telephone devices do not fully offer technical advantages in voice quality. ©Georgios Mantokoudis, Roger Koller, Jérémie Guignard, Marco Caversaccio, Martin Kompis, Pascal Senn. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 24.04.2017.

  7. Phase effects in masking by harmonic complexes: speech recognition.

    PubMed

    Deroche, Mickael L D; Culling, John F; Chatterjee, Monita

    2013-12-01

    Harmonic complexes that generate highly modulated temporal envelopes on the basilar membrane (BM) mask a tone less effectively than complexes that generate relatively flat temporal envelopes, because the non-linear active gain of the BM selectively amplifies a low-level tone in the dips of a modulated masker envelope. The present study examines a similar effect in speech recognition. Speech reception thresholds (SRTs) were measured for a voice masked by harmonic complexes with partials in sine phase (SP) or in random phase (RP). The masker's fundamental frequency (F0) was 50, 100 or 200 Hz. SRTs were considerably lower for SP than for RP maskers at 50-Hz F0, but the two converged at 100-Hz F0, while at 200-Hz F0, SRTs were a little higher for SP than RP maskers. The results were similar whether the target voice was male or female and whether the masker's spectral profile was flat or speech-shaped. Although listening in the masker dips has been shown to play a large role for artificial stimuli such as Schroeder-phase complexes at high levels, it contributes weakly to speech recognition in the presence of harmonic maskers with different crest factors at more moderate sound levels (65 dB SPL). Copyright © 2013 Elsevier B.V. All rights reserved.

  8. The effect of singing training on voice quality for people with quadriplegia.

    PubMed

    Tamplin, Jeanette; Baker, Felicity A; Buttifant, Mary; Berlowitz, David J

    2014-01-01

    Despite anecdotal reports of voice impairment in quadriplegia, the exact nature of these impairments is not well described in the literature. This article details objective and subjective voice assessments for people with quadriplegia at baseline and after a respiratory-targeted singing intervention. Randomized controlled trial. Twenty-four participants with quadriplegia were randomly assigned to a 12-week program of either a singing intervention or active music therapy control. Recordings of singing and speech were made at baseline, 6 weeks, 12 weeks, and 6 months postintervention. These deidentified recordings were used to measure sound pressure levels and assess voice quality using the Multidimensional Voice Profile and the Perceptual Voice Profile. Baseline voice quality data indicated deviation from normality in the areas of breathiness, strain, and roughness. A greater percentage of intervention participants moved toward more normal voice quality in terms of jitter, shimmer, and noise-to-harmonic ratio; however, the improvements failed to achieve statistical significance. Subjective and objective assessments of voice quality indicate that quadriplegia may have a detrimental effect on voice quality; in particular, causing a perception of roughness and breathiness in the voice. The results of this study suggest that singing training may have a role in ameliorating these voice impairments. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  9. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    NASA Technical Reports Server (NTRS)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  10. Perceptual Adaptation of Voice Gender Discrimination with Spectrally Shifted Vowels

    PubMed Central

    Li, Tianhao; Fu, Qian-Jie

    2013-01-01

    Purpose To determine whether perceptual adaptation improves voice gender discrimination of spectrally shifted vowels and, if so, which acoustic cues contribute to the improvement. Method Voice gender discrimination was measured for 10 normal-hearing subjects, during 5 days of adaptation to spectrally shifted vowels, produced by processing the speech of 5 male and 5 female talkers with 16-channel sine-wave vocoders. The subjects were randomly divided into 2 groups; one subjected to 50-Hz, and the other to 200-Hz, temporal envelope cutoff frequencies. No preview or feedback was provided. Results: There was significant adaptation in voice gender discrimination with the 200-Hz cutoff frequency, but significant improvement was observed only for 3 female talkers with F0 > 180 Hz and 3 male talkers with F0 < 170 Hz. There was no significant adaptation with the 50-Hz cutoff frequency. Conclusions Temporal envelope cues are important for voice gender discrimination under spectral shift conditions with perceptual adaptation, but spectral shift may limit the exclusive use of spectral information and/or the use of formant structure on voice gender discrimination. The results have implications for cochlear implant users and for understanding voice gender discrimination. PMID:21173392

  11. Perceptual adaptation of voice gender discrimination with spectrally shifted vowels.

    PubMed

    Li, Tianhao; Fu, Qian-Jie

    2011-08-01

    To determine whether perceptual adaptation improves voice gender discrimination of spectrally shifted vowels and, if so, which acoustic cues contribute to the improvement. Voice gender discrimination was measured for 10 normal-hearing subjects, during 5 days of adaptation to spectrally shifted vowels, produced by processing the speech of 5 male and 5 female talkers with 16-channel sine-wave vocoders. The subjects were randomly divided into 2 groups; one subjected to 50-Hz, and the other to 200-Hz, temporal envelope cutoff frequencies. No preview or feedback was provided. There was significant adaptation in voice gender discrimination with the 200-Hz cutoff frequency, but significant improvement was observed only for 3 female talkers with F(0) > 180 Hz and 3 male talkers with F(0) < 170 Hz. There was no significant adaptation with the 50-Hz cutoff frequency. Temporal envelope cues are important for voice gender discrimination under spectral shift conditions with perceptual adaptation, but spectral shift may limit the exclusive use of spectral information and/or the use of formant structure on voice gender discrimination. The results have implications for cochlear implant users and for understanding voice gender discrimination.

  12. Voice Onset Time for Female Trained and Untrained Singers during Speech and Singing

    ERIC Educational Resources Information Center

    McCrea, Christopher R.; Morris, Richard J.

    2007-01-01

    The purpose of this study was to examine the voice onset times of female trained and untrained singers during spoken and sung tasks. Thirty females were digitally recorded speaking and singing short phrases containing the English stop consonants /p/ and /b/ in the word-initial position. Voice onset time was measured for each phoneme and…

  13. Unmasking the effects of masking on performance: The potential of multiple-voice masking in the office environment.

    PubMed

    Keus van de Poll, Marijke; Carlsson, Johannes; Marsh, John E; Ljung, Robert; Odelius, Johan; Schlittmeier, Sabine J; Sundin, Gunilla; Sörqvist, Patrik

    2015-08-01

    Broadband noise is often used as a masking sound to combat the negative consequences of background speech on performance in open-plan offices. As office workers generally dislike broadband noise, it is important to find alternatives that are more appreciated while being at least not less effective. The purpose of experiment 1 was to compare broadband noise with two alternatives-multiple voices and water waves-in the context of a serial short-term memory task. A single voice impaired memory in comparison with silence, but when the single voice was masked with multiple voices, performance was on level with silence. Experiment 2 explored the benefits of multiple-voice masking in more detail (by comparing one voice, three voices, five voices, and seven voices) in the context of word processed writing (arguably a more office-relevant task). Performance (i.e., writing fluency) increased linearly from worst performance in the one-voice condition to best performance in the seven-voice condition. Psychological mechanisms underpinning these effects are discussed.

  14. The effects of gated speech on the fluency of speakers who stutter.

    PubMed

    Howell, Peter

    2007-01-01

    It is known that the speech of people who stutter improves when the speaker's own vocalization is changed while the participant is speaking. One explanation of these effects is the disruptive rhythm hypothesis (DRH). The DRH maintains that the manipulated sound only needs to disturb timing to affect speech control. The experiment investigated whether speech that was gated on and off (interrupted) affected the speech control of speakers who stutter. Eight children who stutter read a passage when they heard their voice normally and when the speech was gated. Fluency was enhanced (fewer errors were made and time to read a set passage was reduced) when speech was interrupted in this way. The results support the DRH. Copyright 2007 S. Karger AG, Basel.

  15. The impact of vocal rehabilitation on quality of life and voice handicap in patients with total laryngectomy.

    PubMed

    Ţiple, Cristina; Drugan, Tudor; Dinescu, Florina Veronica; Mureşan, Rodica; Chirilă, Magdalena; Cosgarea, Marcel

    2016-01-01

    Health-related quality of life (HRQL) and voice handicap index (VHI) of laryngectomies seem to be relevant regarding voice rehabilitation. The aim of this study is to assess the impact on HRQL and VHI of laryngectomies, following voice rehabilitation. A retrospective study done at the Ear, Nose, and Throat Department of the Emergency County Hospital. Sixty-five laryngectomees were included in this study, of which 62 of them underwent voice rehabilitation. Voice handicap and QOL were assessed using the QOL questionnaires developed by the European Organisation for Research and Treatment of Cancer (EORTC); variables used were functional scales (physical, role, cognitive, emotional, and social), symptom scales (fatigue, pain, and nausea and vomiting), global QOL scale (pain, swallowing, senses, speech, social eating, social contact, and sexuality), and the functional, physical, and emotional aspects of the voice handicap (one-way ANOVA test). The mean age of the patients was 59.22 (standard deviation = 9.00) years. A total of 26 (40%) patients had moderate VHI (between 31 and 60) and 39 (60%) patients had severe VHI (higher than 61). Results of the HRQL questionnaires showed that patients who underwent speech therapy obtained better scores in most scales ( P = 0.000). Patients with esophageal voice had a high score for functional scales compared with or without other voice rehabilitation methods ( P = 0.07), and the VHI score for transesophageal prosthesis was improved after an adjustment period. The global health status and VHI scores showed a statistically significant correlation between speaker groups. The EORTC and the VHI questionnaires offer more information regarding life after laryngectomy.

  16. Objective speech quality assessment and the RPE-LTP coding algorithm in different noise and language conditions.

    PubMed

    Hansen, J H; Nandkumar, S

    1995-01-01

    The formulation of reliable signal processing algorithms for speech coding and synthesis require the selection of a prior criterion of performance. Though coding efficiency (bits/second) or computational requirements can be used, a final performance measure must always include speech quality. In this paper, three objective speech quality measures are considered with respect to quality assessment for American English, noisy American English, and noise-free versions of seven languages. The purpose is to determine whether objective quality measures can be used to quantify changes in quality for a given voice coding method, with a known subjective performance level, as background noise or language conditions are changed. The speech coding algorithm chosen is regular-pulse excitation with long-term prediction (RPE-LTP), which has been chosen as the standard voice compression algorithm for the European Digital Mobile Radio system. Three areas are considered for objective quality assessment which include: (i) vocoder performance for American English in a noise-free environment, (ii) speech quality variation for three additive background noise sources, and (iii) noise-free performance for seven languages which include English, Japanese, Finnish, German, Hindi, Spanish, and French. It is suggested that although existing objective quality measures will never replace subjective testing, they can be a useful means of assessing changes in performance, identifying areas for improvement in algorithm design, and augmenting subjective quality tests for voice coding/compression algorithms in noise-free, noisy, and/or non-English applications.

  17. Duration, Pitch, and Loudness in Kunqu Opera Stage Speech.

    PubMed

    Han, Qichao; Sundberg, Johan

    2017-03-01

    Kunqu is a special type of opera within the Chinese tradition with 600 years of history. In it, stage speech is used for the spoken dialogue. It is performed in Ming Dynasty's mandarin language and is a much more dominant part of the play than singing. Stage speech deviates considerably from normal conversational speech with respect to duration, loudness and pitch. This paper compares these properties in stage speech conversational speech. A famous, highly experienced female singer's performed stage speech and reading of the same lyrics in a conversational speech mode. Clear differences are found. As compared with conversational speech, stage speech had longer word and sentence duration and word duration was less variable. Average sound level was 16 dB higher. Also mean fundamental frequency was considerably higher and more varied. Within sentences, both loudness and fundamental frequency tended to vary according to a low-high-low pattern. Some of the findings fail to support current opinions regarding the characteristics of stage speech, and in this sense the study demonstrates the relevance of objective measurements in descriptions of vocal styles. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  18. Bilingual Computerized Speech Recognition Screening for Depression Symptoms

    ERIC Educational Resources Information Center

    Gonzalez, Gerardo; Carter, Colby; Blanes, Erika

    2007-01-01

    The Voice-Interactive Depression Assessment System (VIDAS) is a computerized speech recognition application for screening depression based on the Center for Epidemiological Studies--Depression scale in English and Spanish. Study 1 included 50 English and 47 Spanish speakers. Study 2 involved 108 English and 109 Spanish speakers. Participants…

  19. Perception of initial obstruent voicing is influenced by gestural organization

    PubMed Central

    Best, Catherine T.; Hallé, Pierre A.

    2009-01-01

    Cross-language differences in phonetic settings for phonological contrasts of stop voicing have posed a challenge for attempts to relate specific phonological features to specific phonetic details. We probe the phonetic-phonological relationship for voicing contrasts more broadly, analyzing in particular their relevance to nonnative speech perception, from two theoretical perspectives: feature geometry and articulatory phonology. Because these perspectives differ in assumptions about temporal/phasing relationships among features/gestures within syllable onsets, we undertook a cross-language investigation on perception of obstruent (stop, fricative) voicing contrasts in three nonnative onsets that use a common set of features/gestures but with differing time-coupling. Listeners of English and French, which differ in their phonetic settings for word-initial stop voicing distinctions, were tested on perception of three onset types, all nonnative to both English and French, that differ in how initial obstruent voicing is coordinated with a lateral feature/gesture and additional obstruent features/gestures. The targets, listed from least complex to most complex onsets, were: a lateral fricative voicing distinction (Zulu /ɬ/-ɮ/), a laterally-released affricate voicing distinction (Tlingit /tɬ/-/dɮ/), and a coronal stop voicing distinction in stop+/l/ clusters (Hebrew /tl/-/dl/). English and French listeners' performance reflected the differences in their native languages' stop voicing distinctions, compatible with prior perceptual studies on singleton consonant onsets. However, both groups' abilities to perceive voicing as a separable parameter also varied systematically with the structure of the target onsets, supporting the notion that the gestural organization of syllable onsets systematically affects perception of initial voicing distinctions. PMID:20228878

  20. Tracking Voice Change after Thyroidectomy: Application of Spectral/Cepstral Analyses

    ERIC Educational Resources Information Center

    Awan, Shaheen N.; Helou, Leah B.; Stojadinovic, Alexander; Solomon, Nancy Pearl

    2011-01-01

    This study evaluates the utility of perioperative spectral and cepstral acoustic analyses to monitor voice change after thyroidectomy. Perceptual and acoustic analyses were conducted on speech samples (sustained vowel /[alpha]/ and CAPE-V sentences) provided by 70 participants (36 women and 34 men) at four study time points: prior to thyroid…