taste voice speech: Topics by Science.gov

Sample records for taste voice speech

76 FR 66734 - National Institute on Deafness and Other Communication Disorders Draft 2012-2016 Strategic Plan

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-27

... areas of hearing and balance; smell and taste; and voice, speech, and language. The Strategic Plan... research training in the normal and disordered processes of hearing, balance, smell, taste, voice, speech... into three program areas: Hearing and balance; smell and taste; and voice, speech, and language. The...
Living with Hearing Loss

MedlinePlus

... version of this page please turn Javascript on. Nora Woodruff and her family, including dad Bob, have ... hearing, balance, smell, taste, voice, speech, and language. Nora Woodruff, daughter of ABC newsman Bob Woodruff and ...
A National Test of Taste and Smell

MedlinePlus

... Javascript on. Feature: Taste, Smell, Hearing, Language, Voice, Balance At Last: A National Test of Taste and ... smell. Read More "Taste, Smell, Hearing, Language, Voice, Balance" Articles At Last: A National Test of Taste ...
The persuasiveness of synthetic speech versus human speech.

PubMed

Stern, S E; Mullennix, J W; Dyson, C; Wilson, S J

1999-12-01

Is computer-synthesized speech as persuasive as the human voice when presenting an argument? After completing an attitude pretest, 193 participants were randomly assigned to listen to a persuasive appeal under three conditions: a high-quality synthesized speech system (DECtalk Express), a low-quality synthesized speech system (Monologue), and a tape recording of a human voice. Following the appeal, participants completed a posttest attitude survey and a series of questionnaires designed to assess perceptions of speech qualities, perceptions of the speaker, and perceptions of the message. The human voice was generally perceived more favorably than the computer-synthesized voice, and the speaker was perceived more favorably when the voice was a human voice than when it was computer synthesized. There was, however, no evidence that computerized speech, as compared with the human voice, affected persuasion or perceptions of the message. Actual or potential applications of this research include issues that should be considered when designing synthetic speech systems.
Assessment of voice and speech symptoms in early Parkinson's disease by the Robertson dysarthria profile.

PubMed

Defazio, Giovanni; Guerrieri, Marta; Liuzzi, Daniele; Gigante, Angelo Fabio; di Nicola, Vincenzo

2016-03-01

Changes in voice and speech are thought to involve 75-90% of people with PD, but the impact of PD progression on voice/speech parameters is not well defined. In this study, we assessed voice/speech symptoms in 48 parkinsonian patients staging <3 on the modified Hoehn and Yahr scale and 37 healthy subjects using the Robertson dysarthria profile (a clinical-perceptual method exploring all components potentially involved in speech difficulties), the Voice handicap index (a validated measure of the impact of voice symptoms on quality of life) and the speech evaluation parameter contained in the Unified Parkinson's Disease Rating Scale part III (UPDRS-III). Accuracy and metric properties of the Robertson dysarthria profile were also measured. On Robertson dysarthria profile, all parkinsonian patients yielded lower scores than healthy control subjects. Differently, the Voice Handicap Index and the speech evaluation parameter contained in the UPDRS-III could detect speech/voice disturbances in 10 and 75% of PD patients, respectively. Validation procedure in Parkinson's disease patients showed that the Robertson dysarthria profile has acceptable reliability, satisfactory internal consistency and scaling assumptions, lack of floor and ceiling effects, and partial correlations with UPDRS-III and Voice Handicap Index. We concluded that speech/voice disturbances are widely identified by the Robertson dysarthria profile in early parkinsonian patients, even when the disturbances do not carry a significant level of disability. Robertson dysarthria profile may be a valuable tool to detect speech/voice disturbances in Parkinson's disease.
System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

DOEpatents

Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

2002-01-01

Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.
Utility and accuracy of perceptual voice and speech distinctions in the diagnosis of Parkinson's disease, PSP and MSA-P.

PubMed

Miller, Nick; Nath, Uma; Noble, Emma; Burn, David

2017-06-01

To determine if perceptual speech measures distinguish people with Parkinson's disease (PD), multiple system atrophy with predominant parkinsonism (MSA-P) and progressive supranuclear palsy (PSP). Speech-language therapists blind to patient characteristics employed clinical rating scales to evaluate speech/voice in 24 people with clinically diagnosed PD, 17 with PSP and 9 with MSA-P, matched for disease duration (mean 4.9 years, standard deviation 2.2). No consistent intergroup differences appeared on specific speech/voice variables. People with PD were significantly less impaired on overall speech/voice severity. Analyses by severity suggested further investigation around laryngeal, resonance and fluency changes may characterize individual groups. MSA-P and PSP compared with PD were distinguished by severity of speech/voice deterioration, but individual speech/voice parameters failed to consistently differentiate groups.
Intra-oral pressure-based voicing control of electrolaryngeal speech with intra-oral vibrator.

PubMed

Takahashi, Hirokazu; Nakao, Masayuki; Kikuchi, Yataro; Kaga, Kimitaka

2008-07-01

In normal speech, coordinated activities of intrinsic laryngeal muscles suspend a glottal sound at utterance of voiceless consonants, automatically realizing a voicing control. In electrolaryngeal speech, however, the lack of voicing control is one of the causes of unclear voice, voiceless consonants tending to be misheard as the corresponding voiced consonants. In the present work, we developed an intra-oral vibrator with an intra-oral pressure sensor that detected utterance of voiceless phonemes during the intra-oral electrolaryngeal speech, and demonstrated that an intra-oral pressure-based voicing control could improve the intelligibility of the speech. The test voices were obtained from one electrolaryngeal speaker and one normal speaker. We first investigated on the speech analysis software how a voice onset time (VOT) and first formant (F1) transition of the test consonant-vowel syllables contributed to voiceless/voiced contrasts, and developed an adequate voicing control strategy. We then compared the intelligibility of consonant-vowel syllables among the intra-oral electrolaryngeal speech with and without online voicing control. The increase of intra-oral pressure, typically with a peak ranging from 10 to 50 gf/cm2, could reliably identify utterance of voiceless consonants. The speech analysis and intelligibility test then demonstrated that a short VOT caused the misidentification of the voiced consonants due to a clear F1 transition. Finally, taking these results together, the online voicing control, which suspended the prosthetic tone while the intra-oral pressure exceeded 2.5 gf/cm2 and during the 35 milliseconds that followed, proved efficient to improve the voiceless/voiced contrast.
Acoustic Analysis of the Voiced-Voiceless Distinction in Dutch Tracheoesophageal Speech

ERIC Educational Resources Information Center

Jongmans, Petra; Wempe, Ton G.; van Tinteren, Harm; Hilgers, Frans J. M.; Pols, Louis C. W.; van As-Brooks, Corina J.

2010-01-01

Purpose: Confusions between voiced and voiceless plosives and voiced and voiceless fricatives are common in Dutch tracheoesophageal (TE) speech. This study investigates (a) which acoustic measures are found to convey a correct voicing contrast in TE speech and (b) whether different measures are found in TE speech than in normal laryngeal (NL)…
Voice and Speech after Laryngectomy

ERIC Educational Resources Information Center

Stajner-Katusic, Smiljka; Horga, Damir; Musura, Maja; Globlek, Dubravka

2006-01-01

The aim of the investigation is to compare voice and speech quality in alaryngeal patients using esophageal speech (ESOP, eight subjects), electroacoustical speech aid (EACA, six subjects) and tracheoesophageal voice prosthesis (TEVP, three subjects). The subjects reading a short story were recorded in the sound-proof booth and the speech samples…
Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

PubMed

Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

2018-05-01

Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
Speech perception in individuals with auditory dys-synchrony: effect of lengthening of voice onset time and burst duration of speech segments.

PubMed

Kumar, U A; Jayaram, M

2013-07-01

The purpose of this study was to evaluate the effect of lengthening of voice onset time and burst duration of selected speech stimuli on perception by individuals with auditory dys-synchrony. This is the second of a series of articles reporting the effect of signal enhancing strategies on speech perception by such individuals. Two experiments were conducted: (1) assessment of the 'just-noticeable difference' for voice onset time and burst duration of speech sounds; and (2) assessment of speech identification scores when speech sounds were modified by lengthening the voice onset time and the burst duration in units of one just-noticeable difference, both in isolation and in combination with each other plus transition duration modification. Lengthening of voice onset time as well as burst duration improved perception of voicing. However, the effect of voice onset time modification was greater than that of burst duration modification. Although combined lengthening of voice onset time, burst duration and transition duration resulted in improved speech perception, the improvement was less than that due to lengthening of transition duration alone. These results suggest that innovative speech processing strategies that enhance temporal cues may benefit individuals with auditory dys-synchrony.
Singing in groups for Parkinson's disease (SING-PD): a pilot study of group singing therapy for PD-related voice/speech disorders.

PubMed

Shih, Ludy C; Piel, Jordan; Warren, Amanda; Kraics, Lauren; Silver, Althea; Vanderhorst, Veronique; Simon, David K; Tarsy, Daniel

2012-06-01

Parkinson's disease related speech and voice impairment have significant impact on quality of life measures. LSVT(®)LOUD voice and speech therapy (Lee Silverman Voice Therapy) has demonstrated scientific efficacy and clinical effectiveness, but musically based voice and speech therapy has been underexplored as a potentially useful method of rehabilitation. We undertook a pilot, open-label study of a group-based singing intervention, consisting of twelve 90-min weekly sessions led by a voice and speech therapist/singing instructor. The primary outcome measure of vocal loudness as measured by sound pressure level (SPL) at 50 cm during connected speech was not significantly different one week after the intervention or at 13 weeks after the intervention. A number of secondary measures reflecting pitch range, phonation time and maximum loudness also were unchanged. Voice related quality of life (VRQOL) and voice handicap index (VHI) also were unchanged. This study suggests that a group singing therapy intervention at this intensity and frequency does not result in significant improvement in objective and subject-rated measures of voice and speech impairment. Copyright © 2012 Elsevier Ltd. All rights reserved.
Assessment of voice, speech, and related quality of life in advanced head and neck cancer patients 10-years+ after chemoradiotherapy.

PubMed

Kraaijenga, S A C; Oskam, I M; van Son, R J J H; Hamming-Vrieze, O; Hilgers, F J M; van den Brekel, M W M; van der Molen, L

2016-04-01

Assessment of long-term objective and subjective voice, speech, articulation, and quality of life in patients with head and neck cancer (HNC) treated with concurrent chemoradiotherapy (CRT) for advanced, stage IV disease. Twenty-two disease-free survivors, treated with cisplatin-based CRT for inoperable HNC (1999-2004), were evaluated at 10-years post-treatment. A standard Dutch text was recorded. Perceptual analysis of voice, speech, and articulation was conducted by two expert listeners (SLPs). Also an experimental expert system based on automatic speech recognition was used. Patients' perception of voice and speech and related quality of life was assessed with the Voice Handicap Index (VHI) and Speech Handicap Index (SHI) questionnaires. At a median follow-up of 11-years, perceptual evaluation showed abnormal scores in up to 64% of cases, depending on the outcome parameter analyzed. Automatic assessment of voice and speech parameters correlated moderate to strong with perceptual outcome scores. Patient-reported problems with voice (VHI>15) and speech (SHI>6) in daily life were present in 68% and 77% of patients, respectively. Patients treated with IMRT showed significantly less impairment compared to those treated with conventional radiotherapy. More than 10-years after organ-preservation treatment, voice and speech problems are common in this patient cohort, as assessed with perceptual evaluation, automatic speech recognition, and with validated structured questionnaires. There were fewer complaints in patients treated with IMRT than with conventional radiotherapy. Copyright © 2016 Elsevier Ltd. All rights reserved.
Comparison of Voice Handicap Index Scores Between Female Students of Speech Therapy and Other Health Professions.

PubMed

Tafiadis, Dionysios; Chronopoulos, Spyridon K; Siafaka, Vassiliki; Drosos, Konstantinos; Kosma, Evangelia I; Toki, Eugenia I; Ziavra, Nausica

2017-09-01

Students' groups (eg, teachers, speech language pathologists) are presumably at risk of developing a voice disorder due to misuse of their voice, which will affect their way of living. Multidisciplinary voice assessment of student populations is currently spread widely along with the use of self-reported questionnaires. This study compared the Voice Handicap Index domains and item scores between female students of speech and language therapy and of other health professions in Greece. We also examined the probability of speech language therapy students developing any vocal symptom. Two hundred female non-dysphonic students (aged 18-31) were recruited. Participants answered the Voice Evaluation Form and the Greek adaptation of the Voice Handicap Index. Significant differences were observed between the two groups (students of speech therapy and other health professions) through Voice Handicap Index (total score, functional and physical domains), excluding the emotional domain. Furthermore, significant differences for specific Voice Handicap Index items, between subgroups, were observed. In conclusion, speech language therapy students had higher Voice Handicap Index scores, which probably could be an indicator for avoiding profession-related dysphonia at a later stage. Also, Voice Handicap Index could be at a first glance an assessment tool for the recognition of potential voice disorder development in students. In turn, the results could be used for indirect therapy approaches, such as providing methods for maintaining vocal health in different student populations. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Start/End Delays of Voiced and Unvoiced Speech Signals

DOE Office of Scientific and Technical Information (OSTI.GOV)

Herrnstein, A

Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measuredmore » acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.« less
Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

PubMed Central

Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin

2016-01-01

Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714
a Study of Multiplexing Schemes for Voice and Data.

NASA Astrophysics Data System (ADS)

Sriram, Kotikalapudi

Voice traffic variations are characterized by on/off transitions of voice calls, and talkspurt/silence transitions of speakers in conversations. A speaker is known to be in silence for more than half the time during a telephone conversation. In this dissertation, we study some schemes which exploit speaker silences for an efficient utilization of the transmission capacity in integrated voice/data multiplexing and in digital speech interpolation. We study two voice/data multiplexing schemes. In each scheme, any time slots momentarily unutilized by the voice traffic are made available to data. In the first scheme, the multiplexer does not use speech activity detectors (SAD), and hence the voice traffic variations are due to call on/off only. In the second scheme, the multiplexer detects speaker silences using SAD and transmits voice only during talkspurts. The multiplexer with SAD performs digital speech interpolation (DSI) as well as dynamic channel allocation to voice and data. The performance of the two schemes is evaluated using discrete-time modeling and analysis. The data delay performance for the case of English speech is compared with that for the case of Japanese speech. A closed form expression for the mean data message delay is derived for the single-channel single-talker case. In a DSI system, occasional speech losses occur whenever the number of speakers in simultaneous talkspurt exceeds the number of TDM voice channels. In a buffered DSI system, speech loss is further reduced at the cost of delay. We propose a novel fixed-delay buffered DSI scheme. In this scheme, speech fill-in/hangover is not required because there are no variable delays. Hence, all silences that naturally occur in speech are fully utilized. Consequently, a substantial improvement in the DSI performance is made possible. The scheme is modeled and analyzed in discrete -time. Its performance is evaluated in terms of the probability of speech clipping, packet rejection ratio, DSI advantage, and the delay.
Brain 'talks over' boring quotes: top-down activation of voice-selective areas while listening to monotonous direct speech quotations.

PubMed

Yao, Bo; Belin, Pascal; Scheepers, Christoph

2012-04-15

In human communication, direct speech (e.g., Mary said, "I'm hungry") is perceived as more vivid than indirect speech (e.g., Mary said that she was hungry). This vividness distinction has previously been found to underlie silent reading of quotations: Using functional magnetic resonance imaging (fMRI), we found that direct speech elicited higher brain activity in the temporal voice areas (TVA) of the auditory cortex than indirect speech, consistent with an "inner voice" experience in reading direct speech. Here we show that listening to monotonously spoken direct versus indirect speech quotations also engenders differential TVA activity. This suggests that individuals engage in top-down simulations or imagery of enriched supra-segmental acoustic representations while listening to monotonous direct speech. The findings shed new light on the acoustic nature of the "inner voice" in understanding direct speech. Copyright Â© 2012 Elsevier Inc. All rights reserved.
Relationship between quality of life instruments and phonatory function in tracheoesophageal speech with voice prosthesis.

PubMed

Miyoshi, Masayuki; Fukuhara, Takahiro; Kataoka, Hideyuki; Hagino, Hiroshi

2016-04-01

The use of tracheoesophageal speech with voice prosthesis (T-E speech) after total laryngectomy has increased recently as a method of vocalization following laryngeal cancer. Previous research has not investigated the relationship between quality of life (QOL) and phonatory function in those using T-E speech. This study aimed to demonstrate the relationship between phonatory function and both comprehensive health-related QOL and QOL related to speech in people using T-E speech. The subjects of the study were 20 male patients using T-E speech after total laryngectomy. At a visit to our clinic, the subjects underwent a phonatory function test and completed three questionnaires: the MOS 8-Item Short-Form Health Survey (SF-8), the Voice Handicap Index-10 (VHI-10), and the Voice-Related Quality of Life (V-RQOL) Measure. A significant correlation was observed between the physical component summary (PCS), a summary score of SF-8, and VHI-10. Additionally, a significant correlation was observed between the SF-8 mental component summary (MCS) and both VHI-10 and VRQOL. Significant correlations were also observed between voice intensity in the phonatory function test and both VHI-10 and V-RQOL. Finally, voice intensity was significantly correlated with the SF-8 PCS. QOL questionnaires and phonatory function tests showed that, in people using T-E speech after total laryngectomy, voice intensity was correlated with comprehensive QOL, including physical and mental health. This finding suggests that voice intensity can be used as a performance index for speech rehabilitation.

Speech and Voice Response to a Levodopa Challenge in Late-Stage Parkinson's Disease.

PubMed

Fabbri, Margherita; Guimarães, Isabel; Cardoso, Rita; Coelho, Miguel; Guedes, Leonor Correia; Rosa, Mario M; Godinho, Catarina; Abreu, Daisy; Gonçalves, Nilza; Antonini, Angelo; Ferreira, Joaquim J

2017-01-01

Parkinson's disease (PD) patients are affected by hypokinetic dysarthria, characterized by hypophonia and dysprosody, which worsens with disease progression. Levodopa's (l-dopa) effect on quality of speech is inconclusive; no data are currently available for late-stage PD (LSPD). To assess the modifications of speech and voice in LSPD following an acute l-dopa challenge. LSPD patients [Schwab and England score <50/Hoehn and Yahr stage >3 (MED ON)] performed several vocal tasks before and after an acute l-dopa challenge. The following was assessed: respiratory support for speech, voice quality, stability and variability, speech rate, and motor performance (MDS-UPDRS-III). All voice samples were recorded and analyzed by a speech and language therapist blinded to patients' therapeutic condition using Praat 5.1 software. 24/27 (14 men) LSPD patients succeeded in performing voice tasks. Median age and disease duration of patients were 79 [IQR: 71.5-81.7] and 14.5 [IQR: 11-15.7] years, respectively. In MED OFF, respiratory breath support and pitch break time of LSPD patients were worse than the normative values of non-parkinsonian. A correlation was found between disease duration and voice quality ( R = 0.51; p = 0.013) and speech rate ( R = -0.55; p = 0.008). l-Dopa significantly improved MDS-UPDRS-III score (20%), with no effect on speech as assessed by clinical rating scales and automated analysis. Speech is severely affected in LSPD. Although l-dopa had some effect on motor performance, including axial signs, speech and voice did not improve. The applicability and efficacy of non-pharmacological treatment for speech impairment should be considered for speech disorder management in PD.
System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

DOEpatents

Burnett, Greg C [Livermore, CA; Holzrichter, John F [Berkeley, CA; Ng, Lawrence C [Danville, CA

2006-08-08

The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.
System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

DOEpatents

Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

2004-03-23

The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.
System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

DOEpatents

Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

2006-02-14

The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.
Detecting Abnormal Word Utterances in Children With Autism Spectrum Disorders: Machine-Learning-Based Voice Analysis Versus Speech Therapists.

PubMed

Nakai, Yasushi; Takiguchi, Tetsuya; Matsui, Gakuyo; Yamaoka, Noriko; Takada, Satoshi

2017-10-01

Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.
Correlational Analysis of Speech Intelligibility Tests and Metrics for Speech Transmission

DTIC Science & Technology

2017-12-04

frequency scale (male voice; normal voice effort) ............................... 4 Fig. 2 Diagram of a speech communication system (Letowski...languages. Consonants contain mostly high frequency (above 1500 Hz) speech energy, but this energy is relatively small in comparison to that of the whole...voices (Letowski et al. 1993). Since the mid- frequency spectral region contains mostly vowel energy while consonants are high frequency sounds, an
Perception of the Voicing Distinction in Speech Produced during Simultaneous Communication

ERIC Educational Resources Information Center

MacKenzie, Douglas J.; Schiavetti, Nicholas; Whitehead, Robert L.; Metz, Dale Evan

2006-01-01

This study investigated the perception of voice onset time (VOT) in speech produced during simultaneous communication (SC). Four normally hearing, experienced sign language users were recorded under SC and speech alone (SA) conditions speaking stimulus words with voiced and voiceless initial consonants embedded in a sentence. Twelve…
System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

DOEpatents

Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

2006-04-25

The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.
Randomized controlled trial of supplemental augmentative and alternative communication versus voice rest alone after phonomicrosurgery.

PubMed

Rousseau, Bernard; Gutmann, Michelle L; Mau, Theodore; Francis, David O; Johnson, Jeffrey P; Novaleski, Carolyn K; Vinson, Kimberly N; Garrett, C Gaelyn

2015-03-01

This randomized trial investigated voice rest and supplemental text-to-speech communication versus voice rest alone on visual analog scale measures of communication effectiveness and magnitude of voice use. Randomized clinical trial. Multicenter outpatient voice clinics. Thirty-seven patients undergoing phonomicrosurgery. Patients undergoing phonomicrosurgery were randomized to voice rest and supplemental text-to-speech communication or voice rest alone. The primary outcome measure was the impact of voice rest on ability to communicate effectively over a 7-day period. Pre- and postoperative magnitude of voice use was also measured as an observational outcome. Patients randomized to voice rest and supplemental text-to-speech communication reported higher median communication effectiveness on each postoperative day compared to those randomized to voice rest alone, with significantly higher median communication effectiveness on postoperative days 3 (P=.03) and 5 (P=.01). Magnitude of voice use did not differ on any preoperative (P>.05) or postoperative day (P>.05), nor did patients significantly decrease voice use as the surgery date approached (P>.05). However, there was a significant reduction in median voice use pre- to postoperatively across patients (P<.001) with median voice use ranging from 0 to 3 throughout the postoperative week. Supplemental text-to-speech communication increased patient-perceived communication effectiveness on postoperative days 3 and 5 over voice rest alone. With the prevalence of smartphones and the widespread use of text messaging, supplemental text-to-speech communication may provide an accessible and cost-effective communication option for patients on vocal restrictions. © American Academy of Otolaryngology—Head and Neck Surgery Foundation 2015.
The Impact of Dysphonic Voices on Healthy Listeners: Listener Reaction Times, Speech Intelligibility, and Listener Comprehension.

PubMed

Evitts, Paul M; Starmer, Heather; Teets, Kristine; Montgomery, Christen; Calhoun, Lauren; Schulze, Allison; MacKenzie, Jenna; Adams, Lauren

2016-11-01

There is currently minimal information on the impact of dysphonia secondary to phonotrauma on listeners. Considering the high incidence of voice disorders with professional voice users, it is important to understand the impact of a dysphonic voice on their audiences. Ninety-one healthy listeners (39 men, 52 women; mean age = 23.62 years) were presented with speech stimuli from 5 healthy speakers and 5 speakers diagnosed with dysphonia secondary to phonotrauma. Dependent variables included processing speed (reaction time [RT] ratio), speech intelligibility, and listener comprehension. Voice quality ratings were also obtained for all speakers by 3 expert listeners. Statistical results showed significant differences between RT ratio and number of speech intelligibility errors between healthy and dysphonic voices. There was not a significant difference in listener comprehension errors. Multiple regression analyses showed that voice quality ratings from the Consensus Assessment Perceptual Evaluation of Voice (Kempster, Gerratt, Verdolini Abbott, Barkmeier-Kraemer, & Hillman, 2009) were able to predict RT ratio and speech intelligibility but not listener comprehension. Results of the study suggest that although listeners require more time to process and have more intelligibility errors when presented with speech stimuli from speakers with dysphonia secondary to phonotrauma, listener comprehension may not be affected.
Designing interaction, voice, and inclusion in AAC research.

PubMed

Pullin, Graham; Treviranus, Jutta; Patel, Rupal; Higginbotham, Jeff

2017-09-01

The ISAAC 2016 Research Symposium included a Design Stream that examined timely issues across augmentative and alternative communication (AAC), framed in terms of designing interaction, designing voice, and designing inclusion. Each is a complex term with multiple meanings; together they represent challenging yet important frontiers of AAC research. The Design Stream was conceived by the four authors, researchers who have been exploring AAC and disability-related design throughout their careers, brought together by a shared conviction that designing for communication implies more than ensuring access to words and utterances. Each of these presenters came to AAC from a different background: interaction design, inclusive design, speech science, and social science. The resulting discussion among 24 symposium participants included controversies about the role of technology, tensions about independence and interdependence, and a provocation about taste. The paper concludes by proposing new directions for AAC research: (a) new interdisciplinary research could combine scientific and design research methods, as distant yet complementary as microanalysis and interaction design, (b) new research tools could seed accessible and engaging contextual research into voice within a social model of disability, and (c) new open research networks could support inclusive, international and interdisciplinary research.
Micro-Based Speech Recognition: Instructional Innovation for Handicapped Learners.

ERIC Educational Resources Information Center

Horn, Carin E.; Scott, Brian L.

A new voice based learning system (VBLS), which allows the handicapped user to interact with a microcomputer by voice commands, is described. Speech or voice recognition is the computerized process of identifying a spoken word or phrase, including those resulting from speech impediments. This new technology is helpful to the severely physically…
Cognitive Load in Voice Therapy Carry-Over Exercises

ERIC Educational Resources Information Center

Iwarsson, Jenny; Morris, David Jackson; Balling, Laura Winther

2017-01-01

Purpose: The cognitive load generated by online speech production may vary with the nature of the speech task. This article examines 3 speech tasks used in voice therapy carry-over exercises, in which a patient is required to adopt and automatize new voice behaviors, ultimately in daily spontaneous communication. Method: Twelve subjects produced…
Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

PubMed

Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar

2016-10-01

Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
Evaluation of Speech Recognition of Cochlear Implant Recipients Using Adaptive, Digital Remote Microphone Technology and a Speech Enhancement Sound Processing Algorithm.

PubMed

Wolfe, Jace; Morais, Mila; Schafer, Erin; Agrawal, Smita; Koch, Dawn

2015-05-01

Cochlear implant recipients often experience difficulty with understanding speech in the presence of noise. Cochlear implant manufacturers have developed sound processing algorithms designed to improve speech recognition in noise, and research has shown these technologies to be effective. Remote microphone technology utilizing adaptive, digital wireless radio transmission has also been shown to provide significant improvement in speech recognition in noise. There are no studies examining the potential improvement in speech recognition in noise when these two technologies are used simultaneously. The goal of this study was to evaluate the potential benefits and limitations associated with the simultaneous use of a sound processing algorithm designed to improve performance in noise (Advanced Bionics ClearVoice) and a remote microphone system that incorporates adaptive, digital wireless radio transmission (Phonak Roger). A two-by-two way repeated measures design was used to examine performance differences obtained without these technologies compared to the use of each technology separately as well as the simultaneous use of both technologies. Eleven Advanced Bionics (AB) cochlear implant recipients, ages 11 to 68 yr. AzBio sentence recognition was measured in quiet and in the presence of classroom noise ranging in level from 50 to 80 dBA in 5-dB steps. Performance was evaluated in four conditions: (1) No ClearVoice and no Roger, (2) ClearVoice enabled without the use of Roger, (3) ClearVoice disabled with Roger enabled, and (4) simultaneous use of ClearVoice and Roger. Speech recognition in quiet was better than speech recognition in noise for all conditions. Use of ClearVoice and Roger each provided significant improvement in speech recognition in noise. The best performance in noise was obtained with the simultaneous use of ClearVoice and Roger. ClearVoice and Roger technology each improves speech recognition in noise, particularly when used at the same time. Because ClearVoice does not degrade performance in quiet settings, clinicians should consider recommending ClearVoice for routine, full-time use for AB implant recipients. Roger should be used in all instances in which remote microphone technology may assist the user in understanding speech in the presence of noise. American Academy of Audiology.
Randomized Controlled Trial of Supplemental Augmentative and Alternative Communication versus Voice Rest Alone after Phonomicrosurgery

PubMed Central

Rousseau, Bernard; Gutmann, Michelle L.; Mau, I-fan Theodore; Francis, David O.; Johnson, Jeffrey P.; Novaleski, Carolyn K.; Vinson, Kimberly N.; Garrett, C. Gaelyn

2015-01-01

Objective This randomized trial investigated voice rest and supplemental text-to-speech communication versus voice rest alone on visual analog scale measures of communication effectiveness and magnitude of voice use. Study Design Randomized clinical trial. Setting Multicenter outpatient voice clinics. Subjects Thirty-seven patients undergoing phonomicrosurgery. Methods Patients undergoing phonomicrosurgery were randomized to voice rest and supplemental text-to-speech communication or voice rest alone. The primary outcome measure was the impact of voice rest on ability to communicate effectively over a seven-day period. Pre- and post-operative magnitude of voice use was also measured as an observational outcome. Results Patients randomized to voice rest and supplemental text-to-speech communication reported higher median communication effectiveness on each post-operative day compared to those randomized to voice rest alone, with significantly higher median communication effectiveness on post-operative day 3 (p = 0.03) and 5 (p = 0.01). Magnitude of voice use did not differ on any pre-operative (p > 0.05) or post-operative day (p > 0.05), nor did patients significantly decrease voice use as the surgery date approached (p > 0.05). However, there was a significant reduction in median voice use pre- to post-operatively across patients (p < 0.001) with median voice use ranging from 0–3 throughout the post-operative week. Conclusion Supplemental text-to-speech communication increased patient perceived communication effectiveness on post-operative days 3 and 5 over voice rest alone. With the prevalence of smartphones and the widespread use of text messaging, supplemental text-to-speech communication may provide an accessible and cost-effective communication option for patients on vocal restrictions. PMID:25605690
Dissecting choral speech: properties of the accompanist critical to stuttering reduction.

PubMed

Kiefte, Michael; Armson, Joy

2008-01-01

The effects of choral speech and altered auditory feedback (AAF) on stuttering frequency were compared to identify those properties of choral speech that make it a more effective condition for stuttering reduction. Seventeen adults who stutter (AWS) participated in an experiment consisting of special choral speech conditions that were manipulated to selectively eliminate specific differences between choral speech and AAF. Consistent with previous findings, results showed that both choral speech and AAF reduced stuttering compared to solo reading. Although reductions under AAF were substantial, they were less dramatic than those for choral speech. Stuttering reduction for choral speech was highly robust even when the accompanist's voice temporally lagged that of the AWS, when there was no opportunity for dynamic interplay between the AWS and accompanist, and when the accompanist was replaced by the AWS's own voice, all of which approximate specific features of AAF. Choral speech was also highly effective in reducing stuttering across changes in speech rate and for both familiar and unfamiliar passages. We concluded that differences in properties between choral speech and AAF other than those that were manipulated in this experiment must account for differences in stuttering reduction. The reader will be able to (1) describe differences in stuttering reduction associated with altered auditory feedback compared to choral speech conditions and (2) describe differences between delivery of a second voice signal as an altered rendition of the speakers own voice (altered auditory feedback) and alterations in the voice of an accompanist (choral speech).
Smartphone App for Voice Disorders

MedlinePlus

... on. Feature: Taste, Smell, Hearing, Language, Voice, Balance Smartphone App for Voice Disorders Past Issues / Fall 2013 ... developed a mobile monitoring device that relies on smartphone technology to gather a week's worth of talking, ...
Systematic studies of modified vocalization: effects of speech rate and instatement style during metronome stimulation.

PubMed

Davidow, Jason H; Bothe, Anne K; Richardson, Jessica D; Andreatta, Richard D

2010-12-01

This study introduces a series of systematic investigations intended to clarify the parameters of the fluency-inducing conditions (FICs) in stuttering. Participants included 11 adults, aged 20-63 years, with typical speech-production skills. A repeated measures design was used to examine the relationships between several speech production variables (vowel duration, voice onset time, fundamental frequency, intraoral pressure, pressure rise time, transglottal airflow, and phonated intervals) and speech rate and instatement style during metronome-entrained rhythmic speech. Measures of duration (vowel duration, voice onset time, and pressure rise time) differed across different metronome conditions. When speech rates were matched between the control condition and metronome condition, voice onset time was the only variable that changed. Results confirm that speech rate and instatement style can influence speech production variables during the production of fluency-inducing conditions. Future studies of normally fluent speech and of stuttered speech must control both features and should further explore the importance of voice onset time, which may be influenced by rate during metronome stimulation in a way that the other variables are not.
Pre- and posttreatment voice and speech outcomes in patients with advanced head and neck cancer treated with chemoradiotherapy: expert listeners' and patient's perception.

PubMed

van der Molen, Lisette; van Rossum, Maya A; Jacobi, Irene; van Son, Rob J J H; Smeele, Ludi E; Rasch, Coen R N; Hilgers, Frans J M

2012-09-01

Perceptual judgments and patients' perception of voice and speech after concurrent chemoradiotherapy (CCRT) for advanced head and neck cancer. Prospective clinical trial. A standard Dutch text and a diadochokinetic task were recorded. Expert listeners rated voice and speech quality (based on Grade, Roughness, Breathiness, Asthenia, and Strain), articulation (overall, [p], [t], [k]), and comparative mean opinion scores of voice and speech at three assessment points calculated. A structured study-specific questionnaire evaluated patients' perception pretreatment (N=55), at 10-week (N=49) and 1-year posttreatment (N=37). At 10 weeks, perceptual voice quality is significantly affected. The parameters overall voice quality (mean, -0.24; P=0.008), strain (mean, -0.12; P=0.012), nasality (mean, -0.08; P=0.009), roughness (mean, -0.22; P=0.001), and pitch (mean, -0.03; P=0.041) improved over time but not beyond baseline levels, except for asthenia at 1-year posttreatment (voice is less asthenic than at baseline; mean, +0.20; P=0.03). Perceptual analyses of articulation showed no significant differences. Patients judge their voice quality as good (score, 18/20) at all assessment points, but at 1-year posttreatment, most of them (70%) judge their "voice not as it used to be." In the 1-year versus 10-week posttreatment comparison, the larynx-hypopharynx tumor group was more strained, whereas nonlarynx tumor voices were judged less strained (mean, -0.33 and +0.07, respectively; P=0.031). Patients' perceived changes in voice and speech quality at 10-week post- versus pretreatment correlate weakly with expert judgments. Overall, perceptual CCRT effects on voice and speech seem to peak at 10-week posttreatment but level off at 1-year posttreatment. However, at that assessment point, most patients still perceive their voice as different from baseline. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

High-frequency energy in singing and speech

NASA Astrophysics Data System (ADS)

Monson, Brian Bruce

While human speech and the human voice generate acoustical energy up to (and beyond) 20 kHz, the energy above approximately 5 kHz has been largely neglected. Evidence is accruing that this high-frequency energy contains perceptual information relevant to speech and voice, including percepts of quality, localization, and intelligibility. The present research was an initial step in the long-range goal of characterizing high-frequency energy in singing voice and speech, with particular regard for its perceptual role and its potential for modification during voice and speech production. In this study, a database of high-fidelity recordings of talkers was created and used for a broad acoustical analysis and general characterization of high-frequency energy, as well as specific characterization of phoneme category, voice and speech intensity level, and mode of production (speech versus singing) by high-frequency energy content. Directionality of radiation of high-frequency energy from the mouth was also examined. The recordings were used for perceptual experiments wherein listeners were asked to discriminate between speech and voice samples that differed only in high-frequency energy content. Listeners were also subjected to gender discrimination tasks, mode-of-production discrimination tasks, and transcription tasks with samples of speech and singing that contained only high-frequency content. The combination of these experiments has revealed that (1) human listeners are able to detect very subtle level changes in high-frequency energy, and (2) human listeners are able to extract significant perceptual information from high-frequency energy.
17 Ways to Say Yes: Toward Nuanced Tone of Voice in AAC and Speech Technology

PubMed Central

Pullin, Graham; Hennig, Shannon

2015-01-01

Abstract People with complex communication needs who use speech-generating devices have very little expressive control over their tone of voice. Despite its importance in human interaction, the issue of tone of voice remains all but absent from AAC research and development however. In this paper, we describe three interdisciplinary projects, past, present and future: The critical design collection Six Speaking Chairs has provoked deeper discussion and inspired a social model of tone of voice; the speculative concept Speech Hedge illustrates challenges and opportunities in designing more expressive user interfaces; the pilot project Tonetable could enable participatory research and seed a research network around tone of voice. We speculate that more radical interactions might expand frontiers of AAC and disrupt speech technology as a whole. PMID:25965913
Reliability in perceptual analysis of voice quality.

PubMed

Bele, Irene Velsvik

2005-12-01

This study focuses on speaking voice quality in male teachers (n = 35) and male actors (n = 36), who represent untrained and trained voice users, because we wanted to investigate normal and supranormal voices. In this study, both substantial and methodologic aspects were considered. It includes a method for perceptual voice evaluation, and a basic issue was rater reliability. A listening group of 10 listeners, 7 experienced speech-language therapists, and 3 speech-language therapist students evaluated the voices by 15 vocal characteristics using VA scales. Two sets of voice signals were investigated: text reading (2 loudness levels) and sustained vowel (3 levels). The results indicated a high interrater reliability for most perceptual characteristics. Connected speech was evaluated more reliably, especially at the normal level, but both types of voice signals were evaluated reliably, although the reliability for connected speech was somewhat higher than for vowels. Experienced listeners tended to be more consistent in their ratings than did the student raters. Some vocal characteristics achieved acceptable reliability even with a smaller panel of listeners. The perceptual characteristics grouped in 4 factors reflected perceptual dimensions.
Practical applications of interactive voice technologies: Some accomplishments and prospects

NASA Technical Reports Server (NTRS)

Grady, Michael W.; Hicklin, M. B.; Porter, J. E.

1977-01-01

A technology assessment of the application of computers and electronics to complex systems is presented. Three existing systems which utilize voice technology (speech recognition and speech generation) are described. Future directions in voice technology are also described.
Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction.

PubMed

Shao, Xu; Milner, Ben

2005-08-01

This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs) as may be encountered in a distributed speech recognition (DSR) system. Previous methods for speech reconstruction have required, in addition to the MFCC vectors, fundamental frequency and voicing components. In this work the voicing classification and fundamental frequency are predicted from the MFCC vectors themselves using two maximum a posteriori (MAP) methods. The first method enables fundamental frequency prediction by modeling the joint density of MFCCs and fundamental frequency using a single Gaussian mixture model (GMM). The second scheme uses a set of hidden Markov models (HMMs) to link together a set of state-dependent GMMs, which enables a more localized modeling of the joint density of MFCCs and fundamental frequency. Experimental results on speaker-independent male and female speech show that accurate voicing classification and fundamental frequency prediction is attained when compared to hand-corrected reference fundamental frequency measurements. The use of the predicted fundamental frequency and voicing for speech reconstruction is shown to give very similar speech quality to that obtained using the reference fundamental frequency and voicing.
Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech.

PubMed

Gerratt, Bruce R; Kreiman, Jody; Garellek, Marc

2016-10-01

The question of what type of utterance-a sustained vowel or continuous speech-is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation. Speakers with voice disorders sustained vowels and read sentences. Vowel samples were excerpted from the steadiest portion of each vowel in the sentences. In addition to sustained and excerpted vowels, a 3rd set of stimuli was created by shortening sustained vowel productions to match the duration of vowels excerpted from continuous speech. Acoustic measures were made on the stimuli, and listeners judged the severity of vocal quality deviation. Sustained vowels and those extracted from continuous speech contain essentially the same acoustic and perceptual information about vocal quality deviation. Perceived and/or measured differences between continuous speech and sustained vowels derive largely from voice source variability across segmental and prosodic contexts and not from variations in vocal fold vibration in the quasisteady portion of the vowels. Approaches to voice quality assessment by using continuous speech samples average across utterances and may not adequately quantify the variability they are intended to assess.
[Comparative studies of the quality of the esophageal voice following laryngectomy: the insufflation test and reverse speech audiometry].

PubMed

Böhme, G; Clasen, B

1989-09-01

We carried out a transnasal insufflation test according to Blom and Singer on 27 laryngectomy patients as well as a speech communications test with the help of reverse speech audiometry, i.e. the post laryngectomy telephone test according to Zenner and Pfrang. The combined evaluation of both tests provided basic information on the quality of the esophagus voice and functionability of the speech organs. Both tests can be carried out quickly and easily and allow a differentiated statement to be made on the application possibilities of a esophagus voice, electronic speech aids and voice prothesis. Three groups could be identified from our results: 1. Insufflation test and reverse speech test provided conformable good or very good results. The esophagus voice was well understood. 2. Complete failure in the insufflation and telephone tests calls for further examinations to exclude any spasm, stricture, divertical and scarred membrane stenosis as well as tumor relapse in the region of the pharyngo-esophageal segments. 3. Organic causes must be looked for in the area of the nozzle as well as cranial nerve failure and social-determined causes in the case of normal insufflation and considerably reduced speech communication in the telephone test.
Patient-reported voice and speech outcomes after whole-neck intensity modulated radiation therapy and chemotherapy for oropharyngeal cancer: prospective longitudinal study.

PubMed

Vainshtein, Jeffrey M; Griffith, Kent A; Feng, Felix Y; Vineberg, Karen A; Chepeha, Douglas B; Eisbruch, Avraham

2014-08-01

To describe voice and speech quality changes and their predictors in patients with locally advanced oropharyngeal cancer treated on prospective clinical studies of organ-preserving chemotherapy-intensity modulated radiation therapy (chemo-IMRT). Ninety-one patients with stage III/IV oropharyngeal cancer were treated on 2 consecutive prospective studies of definitive chemoradiation using whole-field IMRT from 2003 to 2011. Patient-reported voice and speech quality were longitudinally assessed from before treatment through 24 months using the Communication Domain of the Head and Neck Quality of Life (HNQOL-C) instrument and the Speech question of the University of Washington Quality of Life (UWQOL-S) instrument, respectively. Factors associated with patient-reported voice quality worsening from baseline and speech impairment were assessed. Voice quality decreased maximally at 1 month, with 68% and 41% of patients reporting worse HNQOL-C and UWQOL-S scores compared with before treatment, and improved thereafter, recovering to baseline by 12-18 months on average. In contrast, observer-rated larynx toxicity was rare (7% at 3 months; 5% at 6 months). Among patients with mean glottic larynx (GL) dose ≤20 Gy, >20-30 Gy, >30-40 Gy, >40-50 Gy, and >50 Gy, 10%, 32%, 25%, 30%, and 63%, respectively, reported worse voice quality at 12 months compared with before treatment (P=.011). Results for speech impairment were similar. Glottic larynx dose, N stage, neck dissection, oral cavity dose, and time since chemo-IMRT were univariately associated with either voice worsening or speech impairment. On multivariate analysis, mean GL dose remained independently predictive for both voice quality worsening (8.1%/Gy) and speech impairment (4.3%/Gy). Voice quality worsening and speech impairment after chemo-IMRT for locally advanced oropharyngeal cancer were frequently reported by patients, underrecognized by clinicians, and independently associated with GL dose. These findings support reducing mean GL dose to as low as reasonably achievable, aiming at ≤20 Gy when the larynx is not a target. Copyright © 2014 Elsevier Inc. All rights reserved.
Patient-Reported Voice and Speech Outcomes After Whole-Neck Intensity Modulated Radiation Therapy and Chemotherapy for Oropharyngeal Cancer: Prospective Longitudinal Study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vainshtein, Jeffrey M.; Griffith, Kent A.; Feng, Felix Y.

Purpose: To describe voice and speech quality changes and their predictors in patients with locally advanced oropharyngeal cancer treated on prospective clinical studies of organ-preserving chemotherapy–intensity modulated radiation therapy (chemo-IMRT). Methods and Materials: Ninety-one patients with stage III/IV oropharyngeal cancer were treated on 2 consecutive prospective studies of definitive chemoradiation using whole-field IMRT from 2003 to 2011. Patient-reported voice and speech quality were longitudinally assessed from before treatment through 24 months using the Communication Domain of the Head and Neck Quality of Life (HNQOL-C) instrument and the Speech question of the University of Washington Quality of Life (UWQOL-S) instrument, respectively.more » Factors associated with patient-reported voice quality worsening from baseline and speech impairment were assessed. Results: Voice quality decreased maximally at 1 month, with 68% and 41% of patients reporting worse HNQOL-C and UWQOL-S scores compared with before treatment, and improved thereafter, recovering to baseline by 12-18 months on average. In contrast, observer-rated larynx toxicity was rare (7% at 3 months; 5% at 6 months). Among patients with mean glottic larynx (GL) dose ≤20 Gy, >20-30 Gy, >30-40 Gy, >40-50 Gy, and >50 Gy, 10%, 32%, 25%, 30%, and 63%, respectively, reported worse voice quality at 12 months compared with before treatment (P=.011). Results for speech impairment were similar. Glottic larynx dose, N stage, neck dissection, oral cavity dose, and time since chemo-IMRT were univariately associated with either voice worsening or speech impairment. On multivariate analysis, mean GL dose remained independently predictive for both voice quality worsening (8.1%/Gy) and speech impairment (4.3%/Gy). Conclusions: Voice quality worsening and speech impairment after chemo-IMRT for locally advanced oropharyngeal cancer were frequently reported by patients, underrecognized by clinicians, and independently associated with GL dose. These findings support reducing mean GL dose to as low as reasonably achievable, aiming at ≤20 Gy when the larynx is not a target.« less
A high quality voice coder with integrated echo canceller and voice activity detector for mobile satellite applications

NASA Technical Reports Server (NTRS)

Kondoz, A. M.; Evans, B. G.

1993-01-01

In the last decade, low bit rate speech coding research has received much attention resulting in newly developed, good quality, speech coders operating at as low as 4.8 Kb/s. Although speech quality at around 8 Kb/s is acceptable for a wide variety of applications, at 4.8 Kb/s more improvements in quality are necessary to make it acceptable to the majority of applications and users. In addition to the required low bit rate with acceptable speech quality, other facilities such as integrated digital echo cancellation and voice activity detection are now becoming necessary to provide a cost effective and compact solution. In this paper we describe a CELP speech coder with integrated echo canceller and a voice activity detector all of which have been implemented on a single DSP32C with 32 KBytes of SRAM. The quality of CELP coded speech has been improved significantly by a new codebook implementation which also simplifies the encoder/decoder complexity making room for the integration of a 64-tap echo canceller together with a voice activity detector.
Female voice communications in high levels of aircraft cockpit noises--Part I: spectra, levels, and microphones.

PubMed

Nixon, C W; Morris, L J; McCavitt, A R; McKinley, R L; Anderson, T R; McDaniel, M P; Yeager, D G

1998-07-01

Female produced speech, although more intelligible than male speech in some noise spectra, may be more vulnerable to degradation by high levels of some military aircraft cockpit noises. The acoustic features of female speech are higher in frequency, lower in power, and appear more susceptible than male speech to masking by some of these military noises. Current military aircraft voice communication systems were optimized for the male voice and may not adequately accommodate the female voice in these high level noises. This applied study investigated the intelligibility of female and male speech produced in the noise spectra of four military aircraft cockpits at levels ranging from 95 dB to 115 dB. The experimental subjects used standard flight helmets and headsets, noise-canceling microphones, and military aircraft voice communications systems during the measurements. The intelligibility of female speech was lower than that of male speech for all experimental conditions; however, differences were small and insignificant except at the highest levels of the cockpit noises. Intelligibility for both genders varied with aircraft noise spectrum and level. Speech intelligibility of both genders was acceptable during normal cruise noises of all four aircraft, but improvements are required in the higher levels of noise created during aircraft maximum operating conditions. The intelligibility of female speech was unacceptable at the highest measured noise level of 115 dB and may constitute a problem for other military aviators. The intelligibility degradation due to the noise can be neutralized by use of an available, improved noise-canceling microphone, by the application of current active noise reduction technology to the personal communication equipment, and by the development of a voice communications system to accommodate the speech produced by both female and male aviators.
Do What I Say! Voice Recognition Makes Major Advances.

ERIC Educational Resources Information Center

Ruley, C. Dorsey

1994-01-01

Explains voice recognition technology applications in the workplace, schools, and libraries. Highlights include a voice-controlled work station using the DragonDictate system that can be used with dyslexic students, converting text to speech, and converting speech to text. (LRW)
Voice Response Systems Technology.

ERIC Educational Resources Information Center

Gerald, Jeanette

1984-01-01

Examines two methods of generating synthetic speech in voice response systems, which allow computers to communicate in human terms (speech), using human interface devices (ears): phoneme and reconstructed voice systems. Considerations prior to implementation, current and potential applications, glossary, directory, and introduction to Input Output…
Influence of musical training on understanding voiced and whispered speech in noise.

PubMed

Ruggles, Dorea R; Freyman, Richard L; Oxenham, Andrew J

2014-01-01

This study tested the hypothesis that the previously reported advantage of musicians over non-musicians in understanding speech in noise arises from more efficient or robust coding of periodic voiced speech, particularly in fluctuating backgrounds. Speech intelligibility was measured in listeners with extensive musical training, and in those with very little musical training or experience, using normal (voiced) or whispered (unvoiced) grammatically correct nonsense sentences in noise that was spectrally shaped to match the long-term spectrum of the speech, and was either continuous or gated with a 16-Hz square wave. Performance was also measured in clinical speech-in-noise tests and in pitch discrimination. Musicians exhibited enhanced pitch discrimination, as expected. However, no systematic or statistically significant advantage for musicians over non-musicians was found in understanding either voiced or whispered sentences in either continuous or gated noise. Musicians also showed no statistically significant advantage in the clinical speech-in-noise tests. Overall, the results provide no evidence for a significant difference between young adult musicians and non-musicians in their ability to understand speech in noise.
Synthesized speech rate and pitch effects on intelligibility of warning messages for pilots

NASA Technical Reports Server (NTRS)

Simpson, C. A.; Marchionda-Frost, K.

1984-01-01

In civilian and military operations, a future threat-warning system with a voice display could warn pilots of other traffic, obstacles in the flight path, and/or terrain during low-altitude helicopter flights. The present study was conducted to learn whether speech rate and voice pitch of phoneme-synthesized speech affects pilot accuracy and response time to typical threat-warning messages. Helicopter pilots engaged in an attention-demanding flying task and listened for voice threat warnings presented in a background of simulated helicopter cockpit noise. Performance was measured by flying-task performance, threat-warning intelligibility, and response time. Pilot ratings were elicited for the different voice pitches and speech rates. Significant effects were obtained only for response time and for pilot ratings, both as a function of speech rate. For the few cases when pilots forgot to respond to a voice message, they remembered 90 percent of the messages accurately when queried for their response 8 to 10 sec later.
Effects of an Extended Version of the Lee Silverman Voice Treatment on Voice and Speech in Parkinson's Disease

ERIC Educational Resources Information Center

Spielman, Jennifer; Ramig, Lorraine O.; Mahler, Leslie; Halpern, Angela; Gavin, William J.

2007-01-01

Purpose: The present study examined vocal SPL, voice handicap, and speech characteristics in Parkinson's disease (PD) following an extended version of the Lee Silverman Voice Treatment (LSVT), to help determine whether current treatment dosages can be altered without compromising clinical outcomes. Method: Twelve participants with idiopathic PD…
Exploring expressivity and emotion with artificial voice and speech technologies.

PubMed

Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

2013-10-01

Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.
Voice Conversion Using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-Sampling

NASA Astrophysics Data System (ADS)

Mousa, Allam

2010-01-01

Voice changing has many applications in the industry and commercial filed. This paper emphasizes voice conversion using a pitch shifting method which depends on detecting the pitch of the signal (fundamental frequency) using Simplified Inverse Filter Tracking (SIFT) and changing it according to the target pitch period using time stretching with Pitch Synchronous Over Lap Add Algorithm (PSOLA), then resampling the signal in order to have the same play rate. The same study was performed to see the effect of voice conversion when some Arabic speech signal is considered. Treatment of certain Arabic voiced vowels and the conversion between male and female speech has shown some expansion or compression in the resulting speech. Comparison in terms of pitch shifting is presented here. Analysis was performed for a single frame and a full segmentation of speech.
It doesn't matter what you say: FMRI correlates of voice learning and recognition independent of speech content.

PubMed

Zäske, Romi; Awwad Shiekh Hasan, Bashar; Belin, Pascal

2017-09-01

Listeners can recognize newly learned voices from previously unheard utterances, suggesting the acquisition of high-level speech-invariant voice representations during learning. Using functional magnetic resonance imaging (fMRI) we investigated the anatomical basis underlying the acquisition of voice representations for unfamiliar speakers independent of speech, and their subsequent recognition among novel voices. Specifically, listeners studied voices of unfamiliar speakers uttering short sentences and subsequently classified studied and novel voices as "old" or "new" in a recognition test. To investigate "pure" voice learning, i.e., independent of sentence meaning, we presented German sentence stimuli to non-German speaking listeners. To disentangle stimulus-invariant and stimulus-dependent learning, during the test phase we contrasted a "same sentence" condition in which listeners heard speakers repeating the sentences from the preceding study phase, with a "different sentence" condition. Voice recognition performance was above chance in both conditions although, as expected, performance was higher for same than for different sentences. During study phases activity in the left inferior frontal gyrus (IFG) was related to subsequent voice recognition performance and same versus different sentence condition, suggesting an involvement of the left IFG in the interactive processing of speaker and speech information during learning. Importantly, at test reduced activation for voices correctly classified as "old" compared to "new" emerged in a network of brain areas including temporal voice areas (TVAs) of the right posterior superior temporal gyrus (pSTG), as well as the right inferior/middle frontal gyrus (IFG/MFG), the right medial frontal gyrus, and the left caudate. This effect of voice novelty did not interact with sentence condition, suggesting a role of temporal voice-selective areas and extra-temporal areas in the explicit recognition of learned voice identity, independent of speech content. Copyright © 2017 Elsevier Ltd. All rights reserved.
Speech therapy after thyroidectomy

PubMed Central

Wu, Che-Wei

2017-01-01

Common complaints of patients who have received thyroidectomy include dysphonia (voice dysfunction) and dysphagia (difficulty swallowing). One cause of these surgical outcomes is recurrent laryngeal nerve paralysis. Many studies have discussed the effectiveness of speech therapy (e.g., voice therapy and dysphagia therapy) for improving dysphonia and dysphagia, but not specifically in patients who have received thyroidectomy. Therefore, the aim of this paper was to discuss issues regarding speech therapy such as voice therapy and dysphagia for patients after thyroidectomy. Another aim was to review the literature on speech therapy for patients with recurrent laryngeal nerve paralysis after thyroidectomy. Databases used for the literature review in this study included, PubMed, MEDLINE, Academic Search Primer, ERIC, CINAHL Plus, and EBSCO. The articles retrieved by database searches were classified and screened for relevance by using EndNote. Of the 936 articles retrieved, 18 discussed “voice assessment and thyroidectomy”, 3 discussed “voice therapy and thyroidectomy”, and 11 discussed “surgical interventions for voice restoration after thyroidectomy”. Only 3 studies discussed topics related to “swallowing function assessment/treatment and thyroidectomy”. Although many studies have investigated voice changes and assessment methods in thyroidectomy patients, few recent studies have investigated speech therapy after thyroidectomy. Additionally, some studies have addressed dysphagia after thyroidectomy, but few have discussed assessment and treatment of dysphagia after thyroidectomy. PMID:29142841

A Cognitive Neuroscience View of Voice-Processing Abnormalities in Schizophrenia: A Window into Auditory Verbal Hallucinations?

PubMed

Conde, Tatiana; Gonçalves, Oscar F; Pinheiro, Ana P

2016-01-01

Auditory verbal hallucinations (AVH) are a core symptom of schizophrenia. Like "real" voices, AVH carry a rich amount of linguistic and paralinguistic cues that convey not only speech, but also affect and identity, information. Disturbed processing of voice identity, affective, and speech information has been reported in patients with schizophrenia. More recent evidence has suggested a link between voice-processing abnormalities and specific clinical symptoms of schizophrenia, especially AVH. It is still not well understood, however, to what extent these dimensions are impaired and how abnormalities in these processes might contribute to AVH. In this review, we consider behavioral, neuroimaging, and electrophysiological data to investigate the speech, identity, and affective dimensions of voice processing in schizophrenia, and we discuss how abnormalities in these processes might help to elucidate the mechanisms underlying specific phenomenological features of AVH. Schizophrenia patients exhibit behavioral and neural disturbances in the three dimensions of voice processing. Evidence suggesting a role of dysfunctional voice processing in AVH seems to be stronger for the identity and speech dimensions than for the affective domain.
The Prevalence of Stuttering, Voice, and Speech-Sound Disorders in Primary School Students in Australia

ERIC Educational Resources Information Center

McKinnon, David H.; McLeod, Sharynne; Reilly, Sheena

2007-01-01

Purpose: The aims of this study were threefold: to report teachers' estimates of the prevalence of speech disorders (specifically, stuttering, voice, and speech-sound disorders); to consider correspondence between the prevalence of speech disorders and gender, grade level, and socioeconomic status; and to describe the level of support provided to…
Intensive Voice Treatment (LSVT[R]LOUD) for Parkinson's Disease Following Deep Brain Stimulation of the Subthalamic Nucleus

ERIC Educational Resources Information Center

Spielman, Jennifer; Mahler, Leslie; Halpern, Angela; Gilley, Phllip; Klepitskaya, Olga; Ramig, Lorraine

2011-01-01

Purpose: Intensive voice therapy (LSVT[R]LOUD) can effectively manage voice and speech symptoms associated with idiopathic Parkinson disease (PD). This small-group study evaluated voice and speech in individuals with and without deep brain stimulation of the subthalamic nucleus (STN-DBS) before and after LSVT LOUD, to determine whether outcomes…
Double Fourier analysis for Emotion Identification in Voiced Speech

NASA Astrophysics Data System (ADS)

Sierra-Sosa, D.; Bastidas, M.; Ortiz P., D.; Quintero, O. L.

2016-04-01

We propose a novel analysis alternative, based on two Fourier Transforms for emotion recognition from speech. Fourier analysis allows for display and synthesizes different signals, in terms of power spectral density distributions. A spectrogram of the voice signal is obtained performing a short time Fourier Transform with Gaussian windows, this spectrogram portraits frequency related features, such as vocal tract resonances and quasi-periodic excitations during voiced sounds. Emotions induce such characteristics in speech, which become apparent in spectrogram time-frequency distributions. Later, the signal time-frequency representation from spectrogram is considered an image, and processed through a 2-dimensional Fourier Transform in order to perform the spatial Fourier analysis from it. Finally features related with emotions in voiced speech are extracted and presented.
Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception.

PubMed

Schall, Sonja; von Kriegstein, Katharina

2014-01-01

It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers' voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker's face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.
Intentional Voice Command Detection for Trigger-Free Speech Interface

NASA Astrophysics Data System (ADS)

Obuchi, Yasunari; Sumiyoshi, Takashi

In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.
Should singing activities be included in speech and voice therapy for prepubertal children?

PubMed

Rinta, Tiija; Welch, Graham F

2008-01-01

Customarily, speaking and singing have tended to be regarded as two completely separate sets of behaviors in clinical and educational settings. The treatment of speech and voice disorders has focused on the client's speaking ability, as this is perceived to be the main vocal behavior of concern. However, according to a broader voice-science perspective, given that the same vocal structure is used for speaking and singing, it may be possible to include singing in speech and voice therapy. In this article, a theoretical framework is proposed that indicates possible benefits from the inclusion of singing in such therapeutic settings. Based on a literature review, it is demonstrated theoretically why singing activities can potentially be exploited in the treatment of prepubertal children suffering from speech and voice disorders. Based on this theoretical framework, implications for further empirical research and practice are suggested.
Silent reading of direct versus indirect speech activates voice-selective areas in the auditory cortex.

PubMed

Yao, Bo; Belin, Pascal; Scheepers, Christoph

2011-10-01

In human communication, direct speech (e.g., Mary said: "I'm hungry") is perceived to be more vivid than indirect speech (e.g., Mary said [that] she was hungry). However, for silent reading, the representational consequences of this distinction are still unclear. Although many of us share the intuition of an "inner voice," particularly during silent reading of direct speech statements in text, there has been little direct empirical confirmation of this experience so far. Combining fMRI with eye tracking in human volunteers, we show that silent reading of direct versus indirect speech engenders differential brain activation in voice-selective areas of the auditory cortex. This suggests that readers are indeed more likely to engage in perceptual simulations (or spontaneous imagery) of the reported speaker's voice when reading direct speech as opposed to meaning-equivalent indirect speech statements as part of a more vivid representation of the former. Our results may be interpreted in line with embodied cognition and form a starting point for more sophisticated interdisciplinary research on the nature of auditory mental simulation during reading.
Automatic initial and final segmentation in cleft palate speech of Mandarin speakers.

PubMed

He, Ling; Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang

2017-01-01

The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with "quasi-unvoiced" or with "quasi-voiced" initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%.
Fluid-acoustic interactions and their impact on pathological voiced speech

NASA Astrophysics Data System (ADS)

Erath, Byron D.; Zanartu, Matias; Peterson, Sean D.; Plesniak, Michael W.

2011-11-01

Voiced speech is produced by vibration of the vocal fold structures. Vocal fold dynamics arise from aerodynamic pressure loadings, tissue properties, and acoustic modulation of the driving pressures. Recent speech science advancements have produced a physiologically-realistic fluid flow solver (BLEAP) capable of prescribing asymmetric intraglottal flow attachment that can be easily assimilated into reduced order models of speech. The BLEAP flow solver is extended to incorporate acoustic loading and sound propagation in the vocal tract by implementing a wave reflection analog approach for sound propagation based on the governing BLEAP equations. This enhanced physiological description of the physics of voiced speech is implemented into a two-mass model of speech. The impact of fluid-acoustic interactions on vocal fold dynamics is elucidated for both normal and pathological speech through linear and nonlinear analysis techniques. Supported by NSF Grant CBET-1036280.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOEpatents

Holzrichter, J.F.; Ng, L.C.

1998-03-17

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOEpatents

Holzrichter, John F.; Ng, Lawrence C.

1998-01-01

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.
A voice-input voice-output communication aid for people with severe speech impairment.

PubMed

Hawley, Mark S; Cunningham, Stuart P; Green, Phil D; Enderby, Pam; Palmer, Rebecca; Sehgal, Siddharth; O'Neill, Peter

2013-01-01

A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment-the voice-input voice-output communication aid (VIVOCA)-is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified and refined key requirements for the device. A novel methodology for building small vocabulary, speaker-dependent automatic speech recognizers with reduced amounts of training data, was applied. Experiments showed that this method is successful in generating good recognition performance (mean accuracy 96%) on highly disordered speech, even when recognition perplexity is increased. The selected message-building technique traded off various factors including speed of message construction and range of available message outputs. The VIVOCA was evaluated in a field trial by individuals with moderate to severe dysarthria and confirmed that they can make use of the device to produce intelligible speech output from disordered speech input. The trial highlighted some issues which limit the performance and usability of the device when applied in real usage situations, with mean recognition accuracy of 67% in these circumstances. These limitations will be addressed in future work.
Perceptual, auditory and acoustic vocal analysis of speech and singing in choir conductors.

PubMed

Rehder, Maria Inês Beltrati Cornacchioni; Behlau, Mara

2008-01-01

the voice of choir conductors. to evaluate the vocal quality of choir conductors based on the production of a sustained vowel during singing and when speaking in order to observe auditory and acoustic differences. participants of this study were 100 choir conductors, with an equal distribution between genders. Participants were asked to produce the sustained vowel "é" using a singing and speaking voice. Speech samples were analyzed based on auditory-perceptive and acoustic parameters. The auditory-perceptive analysis was carried out by two speech-language pathologist, specialists in this field of knowledge. The acoustic analysis was carried out with the support of the computer software Doctor Speech (Tiger Electronics, SRD, USA, version 4.0), using the Real Analysis module. the auditory-perceptive analysis of the vocal quality indicated that most conductors have adapted voices, presenting more alterations in their speaking voice. The acoustic analysis indicated different values between genders and between the different production modalities. The fundamental frequency was higher in the singing voice, as well as the values for the first formant; the second formant presented lower values in the singing voice, with statistically significant results only for women. the voice of choir conductors is adapted, presenting fewer deviations in the singing voice when compared to the speaking voice. Productions differ based the voice modality, singing or speaking.
Stop consonant voicing in young children's speech: Evidence from a cross-sectional study

NASA Astrophysics Data System (ADS)

Ganser, Emily

There are intuitive reasons to believe that speech-sound acquisition and language acquisition should be related in development. Surprisingly, only recently has research begun to parse just how the two might be related. This study investigated possible correlations between speech-sound acquisition and language acquisition, as part of a large-scale, longitudinal study of the relationship between different types of phonological development and vocabulary growth in the preschool years. Productions of voiced and voiceless stop-initial words were recorded from 96 children aged 28-39 months. Voice Onset Time (VOT, in ms) for each token context was calculated. A mixed-model logistic regression was calculated which predicted whether the sound was intended to be voiced or voiceless based on its VOT. This model estimated the slopes of the logistic function for each child. This slope was referred to as Robustness of Contrast (based on Holliday, Reidy, Beckman, and Edwards, 2015), defined as being the degree of categorical differentiation between the production of two speech sounds or classes of sounds, in this case, voiced and voiceless stops. Results showed a wide range of slopes for individual children, suggesting that slope-derived Robustness of Contrast could be a viable means of measuring a child's acquisition of the voicing contrast. Robustness of Contrast was then compared to traditional measures of speech and language skills to investigate whether there was any correlation between the production of stop voicing and broader measures of speech and language development. The Robustness of Contrast measure was found to correlate with all individual measures of speech and language, suggesting that it might indeed be predictive of later language skills.
Hearing Story Characters' Voices: Auditory Imagery during Reading

ERIC Educational Resources Information Center

Gunraj, Danielle N.; Klin, Celia M.

2012-01-01

Despite the longstanding belief in an inner voice, there is surprisingly little known about the perceptual features of that voice during text processing. This article asked whether readers infer nonlinguistic phonological features, such as speech rate, associated with a character's speech. Previous evidence for this type of auditory imagery has…
Telephony-based voice pathology assessment using automated speech analysis.

PubMed

Moran, Rosalyn J; Reilly, Richard B; de Chazal, Philip; Lacy, Peter D

2006-03-01

A system for remotely detecting vocal fold pathologies using telephone-quality speech is presented. The system uses a linear classifier, processing measurements of pitch perturbation, amplitude perturbation and harmonic-to-noise ratio derived from digitized speech recordings. Voice recordings from the Disordered Voice Database Model 4337 system were used to develop and validate the system. Results show that while a sustained phonation, recorded in a controlled environment, can be classified as normal or pathologic with accuracy of 89.1%, telephone-quality speech can be classified as normal or pathologic with an accuracy of 74.2%, using the same scheme. Amplitude perturbation features prove most robust for telephone-quality speech. The pathologic recordings were then subcategorized into four groups, comprising normal, neuromuscular pathologic, physical pathologic and mixed (neuromuscular with physical) pathologic. A separate classifier was developed for classifying the normal group from each pathologic subcategory. Results show that neuromuscular disorders could be detected remotely with an accuracy of 87%, physical abnormalities with an accuracy of 78% and mixed pathology voice with an accuracy of 61%. This study highlights the real possibility for remote detection and diagnosis of voice pathology.
Issues in forensic voice.

PubMed

Hollien, Harry; Huntley Bahr, Ruth; Harnsberger, James D

2014-03-01

The following article provides a general review of an area that can be referred to as Forensic Voice. Its goals will be outlined and that discussion will be followed by a description of its major elements. Considered are (1) the processing and analysis of spoken utterances, (2) distorted speech, (3) enhancement of speech intelligibility (re: surveillance and other recordings), (4) transcripts, (5) authentication of recordings, (6) speaker identification, and (7) the detection of deception, intoxication, and emotions in speech. Stress in speech and the psychological stress evaluation systems (that some individuals attempt to use as lie detectors) also will be considered. Points of entry will be suggested for individuals with the kinds of backgrounds possessed by professionals already working in the voice area. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
[Relevance of psychosocial factors in speech rehabilitation after laryngectomy].

PubMed

Singer, S; Fuchs, M; Dietz, A; Klemm, E; Kienast, U; Meyer, A; Oeken, J; Täschner, R; Wulke, C; Schwarz, R

2007-12-01

Often it is assumed that psychosocial and sociodemographic factors cause the success of voice rehabilitation after laryngectomy. Aim of this study was to analyze the association between these parameters. Based on tumor registries of six ENT-clinics all patients were surveyed, who were laryngectomized in the years before (N = 190). Success of voice rehabilitation has been assessed as speech intelligibility measured with the postlaryngectomy-telephone-intelligibility-test. For the assessment of the psychosocial parameters validated and standardized instruments were used if possible. Statistical analysis was done by multiple logistic regression analysis. Low speech intelligibility is associated with reduced conversations (OR 0.970) and social activity (OR 1.049). Patients are more likely to talk with esophageal voice when their motivation for learning the new voice was high (OR 7.835) and when they assessed their speech therapist as important for their motivation (OR 4.794). The risk to communicate merely by whispering is higher when patients live together with a partner (OR 5.293), when they talk seldomly (OR 1.017) and when they are not very active in social contexts (OR 0.966). Psychosocial factors can only partly explain how voice rehabilitation after laryngectomy becomes a success. Speech intelligibility is associated with active communication behaviour, whereas the use of an esophageal voice is correlated with motivation. It seems that the gaining of tracheoesophageal puncture voice is independent of psychosocial factors.
Irregular vocal fold dynamics incited by asymmetric fluid loading in a model of recurrent laryngeal nerve paralysis

NASA Astrophysics Data System (ADS)

Sommer, David; Erath, Byron D.; Zanartu, Matias; Peterson, Sean D.

2011-11-01

Voiced speech is produced by dynamic fluid-structure interactions in the larynx. Traditionally, reduced order models of speech have relied upon simplified inviscid flow solvers to prescribe the fluid loadings that drive vocal fold motion, neglecting viscous flow effects that occur naturally in voiced speech. Viscous phenomena, such as skewing of the intraglottal jet, have the most pronounced effect on voiced speech in cases of vocal fold paralysis where one vocal fold loses some, or all, muscular control. The impact of asymmetric intraglottal flow in pathological speech is captured in a reduced order two-mass model of speech by coupling a boundary-layer estimation of the asymmetric pressures with asymmetric tissue parameters that are representative of recurrent laryngeal nerve paralysis. Nonlinear analysis identifies the emergence of irregular and chaotic vocal fold dynamics at values representative of pathological speech conditions.

Voice-stress measure of mental workload

NASA Technical Reports Server (NTRS)

Alpert, Murray; Schneider, Sid J.

1988-01-01

In a planned experiment, male subjects between the age of 18 and 50 will be required to produce speech while performing various tasks. Analysis of the speech produced should reveal which aspects of voice prosody are associated with increased workloads. Preliminary results with two female subjects suggest a possible trend for voice frequency and amplitude to be higher and the variance of the voice frequency to be lower in the high workload condition.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, J.F.; Ng, L.C.

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used formore » purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.« less
Functional Connectivity between Face-Movement and Speech-Intelligibility Areas during Auditory-Only Speech Perception

PubMed Central

Schall, Sonja; von Kriegstein, Katharina

2014-01-01

It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers’ voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker’s face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas. PMID:24466026
The Effect of Anchors and Training on the Reliability of Voice Quality Ratings for Different Types of Speech Stimuli.

PubMed

Brinca, Lilia; Batista, Ana Paula; Tavares, Ana Inês; Pinto, Patrícia N; Araújo, Lara

2015-11-01

The main objective of the present study was to investigate if the type of voice stimuli-sustained vowel, oral reading, and connected speech-results in good intrarater and interrater agreement/reliability. A short-term panel study was performed. Voice samples from 30 native European Portuguese speakers were used in the present study. The speech materials used were (1) the sustained vowel /a/, (2) oral reading of the European Portuguese version of "The Story of Arthur the Rat," and (3) connected speech. After an extensive training with textual and auditory anchors, the judges were asked to rate the severity of dysphonic voice stimuli using the phonation dimensions G, R, and B from the GRBAS scale. The voice samples were judged 6 months and 1 year after the training. Intrarater agreement and reliability were generally very good for all the phonation dimensions and voice stimuli. The highest interrater reliability was obtained using the oral reading stimulus, particularly for phonation dimensions grade (G) and breathiness (B). Roughness (R) was the voice quality that was the most difficult to evaluate, leading to interrater unreliability in all voice quality ratings. Extensive training using textual and auditory anchors and the use of anchors during the voice evaluations appear to be good methods for auditory-perceptual evaluation of dysphonic voices. The best results of interrater reliability were obtained when the oral reading stimulus was used. Breathiness appears to be a voice quality that is easier to evaluate than roughness. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
"Who" is saying "what"? Brain-based decoding of human voice and speech.

PubMed

Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer

2008-11-07

Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.
Tutorial and Guidelines on Measurement of Sound Pressure Level in Voice and Speech.

PubMed

Švec, Jan G; Granqvist, Svante

2018-03-15

Sound pressure level (SPL) measurement of voice and speech is often considered a trivial matter, but the measured levels are often reported incorrectly or incompletely, making them difficult to compare among various studies. This article aims at explaining the fundamental principles behind these measurements and providing guidelines to improve their accuracy and reproducibility. Basic information is put together from standards, technical, voice and speech literature, and practical experience of the authors and is explained for nontechnical readers. Variation of SPL with distance, sound level meters and their accuracy, frequency and time weightings, and background noise topics are reviewed. Several calibration procedures for SPL measurements are described for stand-mounted and head-mounted microphones. SPL of voice and speech should be reported together with the mouth-to-microphone distance so that the levels can be related to vocal power. Sound level measurement settings (i.e., frequency weighting and time weighting/averaging) should always be specified. Classified sound level meters should be used to assure measurement accuracy. Head-mounted microphones placed at the proximity of the mouth improve signal-to-noise ratio and can be taken advantage of for voice SPL measurements when calibrated. Background noise levels should be reported besides the sound levels of voice and speech.
Speech enhancement on smartphone voice recording

NASA Astrophysics Data System (ADS)

Tris Atmaja, Bagus; Nur Farid, Mifta; Arifianto, Dhany

2016-11-01

Speech enhancement is challenging task in audio signal processing to enhance the quality of targeted speech signal while suppress other noises. In the beginning, the speech enhancement algorithm growth rapidly from spectral subtraction, Wiener filtering, spectral amplitude MMSE estimator to Non-negative Matrix Factorization (NMF). Smartphone as revolutionary device now is being used in all aspect of life including journalism; personally and professionally. Although many smartphones have two microphones (main and rear) the only main microphone is widely used for voice recording. This is why the NMF algorithm widely used for this purpose of speech enhancement. This paper evaluate speech enhancement on smartphone voice recording by using some algorithms mentioned previously. We also extend the NMF algorithm to Kulback-Leibler NMF with supervised separation. The last algorithm shows improved result compared to others by spectrogram and PESQ score evaluation.
Comparing Measures of Voice Quality from Sustained Phonation and Continuous Speech

ERIC Educational Resources Information Center

Gerratt, Bruce R.; Kreiman, Jody; Garellek, Marc

2016-01-01

Purpose: The question of what type of utterance--a sustained vowel or continuous speech--is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation.…
Changes in Voice Onset Time and Motor Speech Skills in Children following Motor Speech Therapy: Evidence from /pa/ productions

PubMed Central

Yu, Vickie Y.; Kadis, Darren S.; Oh, Anna; Goshulak, Debra; Namasivayam, Aravind; Pukonen, Margit; Kroll, Robert; De Nil, Luc F.; Pang, Elizabeth W.

2016-01-01

This study evaluated changes in motor speech control and inter-gestural coordination for children with speech sound disorders (SSD) subsequent to PROMPT (Prompts for Restructuring Oral Muscular Phonetic Targets) intervention. We measured the distribution patterns of voice onset time (VOT) for a voiceless stop (/p/) to examine the changes in inter-gestural coordination. Two standardized tests were used (VMPAC, GFTA-2) to assess the changes in motor speech skills and articulation. Data showed positive changes in patterns of VOT with a lower pattern of variability. All children showed significantly higher scores for VMPAC, but only some children showed higher scores for GFTA-2. Results suggest that the proprioceptive feedback provided through PROMPT had a positive influence on motor speech control and inter-gestural coordination in voicing behavior. This set of VOT data for children with SSD adds to our understanding of the speech characteristics underlying motor speech control. Directions for future studies are discussed. PMID:24446799
Vocal Age Disguise: The Role of Fundamental Frequency and Speech Rate and Its Perceived Effects.

PubMed

Skoog Waller, Sara; Eriksson, Mårten

2016-01-01

The relationship between vocal characteristics and perceived age is of interest in various contexts, as is the possibility to affect age perception through vocal manipulation. A few examples of such situations are when age is staged by actors, when ear witnesses make age assessments based on vocal cues only or when offenders (e.g., online groomers) disguise their voice to appear younger or older. This paper investigates how speakers spontaneously manipulate two age related vocal characteristics ( f 0 and speech rate) in attempt to sound younger versus older than their true age, and if the manipulations correspond to actual age related changes in f 0 and speech rate (Study 1). Further aims of the paper is to determine how successful vocal age disguise is by asking listeners to estimate the age of generated speech samples (Study 2) and to examine whether or not listeners use f 0 and speech rate as cues to perceived age. In Study 1, participants from three age groups (20-25, 40-45, and 60-65 years) agreed to read a short text under three voice conditions. There were 12 speakers in each age group (six women and six men). They used their natural voice in one condition, attempted to sound 20 years younger in another and 20 years older in a third condition. In Study 2, 60 participants (listeners) listened to speech samples from the three voice conditions in Study 1 and estimated the speakers' age. Each listener was exposed to all three voice conditions. The results from Study 1 indicated that the speakers increased fundamental frequency ( f 0 ) and speech rate when attempting to sound younger and decreased f 0 and speech rate when attempting to sound older. Study 2 showed that the voice manipulations had an effect in the sought-after direction, although the achieved mean effect was only 3 years, which is far less than the intended effect of 20 years. Moreover, listeners used speech rate, but not f 0 , as a cue to speaker age. It was concluded that age disguise by voice can be achieved by naïve speakers even though the perceived effect was smaller than intended.
Age-related changes to spectral voice characteristics affect judgments of prosodic, segmental, and talker attributes for child and adult speech.

PubMed

Dilley, Laura C; Wieland, Elizabeth A; Gamache, Jessica L; McAuley, J Devin; Redford, Melissa A

2013-02-01

As children mature, changes in voice spectral characteristics co-vary with changes in speech, language, and behavior. In this study, spectral characteristics were manipulated to alter the perceived ages of talkers' voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Speech was modified by lowering formants and fundamental frequency, for 5-year-old children's utterances, or raising them, for adult caregivers' utterances. Next, participants differing in awareness of the manipulation (Experiment 1A) or amount of speech-language training (Experiment 1B) made judgments of prosodic, segmental, and talker attributes. Experiment 2 investigated the effects of spectral modification on intelligibility. Finally, in Experiment 3, trained analysts used formal prosody coding to assess prosodic characteristics of spectrally modified and unmodified speech. Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work.
A comparison of recordings of sentences and spontaneous speech: perceptual and acoustic measures in preschool children's voices.

PubMed

McAllister, Anita; Brandt, Signe Kofoed

2012-09-01

A well-controlled recording in a studio is fundamental in most voice rehabilitation. However, this laboratory like recording method has been questioned because voice use in a natural environment may be quite different. In children's natural environment, high background noise levels are common and are an important factor contributing to voice problems. The primary noise source in day-care centers is the children themselves. The aim of the present study was to compare perceptual evaluations of voice quality and acoustic measures from a controlled recording with recordings of spontaneous speech in children's natural environment in a day-care setting. Eleven 5-year-old children were recorded three times during a day at the day care. The controlled speech material consisted of repeated sentences. Matching sentences were selected from the spontaneous speech. All sentences were repeated three times. Recordings were randomized and analyzed acoustically and perceptually. Statistic analyses showed that fundamental frequency was significantly higher in spontaneous speech (P<0.01) as was hyperfunction (P<0.001). The only characteristic the controlled sentences shared with spontaneous speech was degree of hoarseness (Spearman's rho=0.564). When data for boys and girls were analyzed separately, a correlation was found for the parameter breathiness (rho=0.551) for boys, and for girls the correlation for hoarseness remained (rho=0.752). Regarding acoustic data, none of the measures correlated across recording conditions for the whole group. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Cognitive Load in Voice Therapy Carry-Over Exercises.

PubMed

Iwarsson, Jenny; Morris, David Jackson; Balling, Laura Winther

2017-01-01

The cognitive load generated by online speech production may vary with the nature of the speech task. This article examines 3 speech tasks used in voice therapy carry-over exercises, in which a patient is required to adopt and automatize new voice behaviors, ultimately in daily spontaneous communication. Twelve subjects produced speech in 3 conditions: rote speech (weekdays), sentences in a set form, and semispontaneous speech. Subjects simultaneously performed a secondary visual discrimination task for which response times were measured. On completion of each speech task, subjects rated their experience on a questionnaire. Response times from the secondary, visual task were found to be shortest for the rote speech, longer for the semispontaneous speech, and longest for the sentences within the set framework. Principal components derived from the subjective ratings were found to be linked to response times on the secondary visual task. Acoustic measures reflecting fundamental frequency distribution and vocal fold compression varied across the speech tasks. The results indicate that consideration should be given to the selection of speech tasks during the process leading to automation of revised speech behavior and that self-reports may be a reliable index of cognitive load.
Assessment of voice, speech and communication changes associated with cervical spinal cord injury.

PubMed

Johansson, Kerstin; Seiger, Åke; Forsén, Malin; Holmgren Nilsson, Jeanette; Hartelius, Lena; Schalling, Ellika

2018-02-24

Respiratory muscle impairment following cervical spinal cord injury (CSCI) may lead to reduced voice function, although the individual variation is large. Voice problems in this population may not always receive attention since individuals with CSCI face other, more acute and life-threatening issues that need/receive attention. Currently there is no consensus on the tasks suitable to identify the specific voice impairments and functional voice changes experienced by individuals with CSCI. To examine which voice/speech tasks identify the specific voice and communication changes associated with CSCI, habitual and maximum speech performance of a group with CSCI was compared with that of a healthy control group (CG), and the findings were related to respiratory function and to self-reported voice problems. Respiratory, aerodynamic, acoustic and self-reported voice data from 19 individuals (nine women and 10 men, aged 23-59 years, heights = 153-192 cm) with CSCI (levels C3-C7) were compared with data from a CG consisting of 19 carefully matched non-injured people (nine women and 10 men, aged 19-59 years, heights = 152-187 cm). Despite considerable variability of performance, highly significant differences between the group with CSCI and the CG were found in maximum phonation time, maximum duration of breath phrases, maximum sound pressure level and maximum voice area in voice-range profiles (all p = .000). Subglottal pressure was lower and phonatory stability was reduced in some of the individuals with CSCI, but differences between the groups were not statistically significant. Six of 19 had voice handicap index (VHI) scores above 20 (the cut-off for voice disorder). Individuals with a vital capacity below 50% of the expected for an equivalent reference individual performed significantly worse than participants with more normal vital capacity. Completeness and level of injury seemed to impact vocal function in some individuals. A combination of maximum performance speech tasks, respiratory tasks and self-reported information on voice problems help to identify individuals with reduced voice function following CSCI. Early identification of individuals with voice changes post-CSCI, and introducing appropriate rehabilitation strategies, may help to minimize development of maladaptive voice behaviours such as vocal strain, which can lead to further impairments and limitations to communication participation. © 2018 Royal College of Speech and Language Therapists.
The relative impact of generic head-related transfer functions on auditory speech thresholds: implications for the design of three-dimensional audio displays.

PubMed

Arrabito, G R; McFadden, S M; Crabtree, R B

2001-07-01

Auditory speech thresholds were measured in this study. Subjects were required to discriminate a female voice recording of three-digit numbers in the presence of diotic speech babble. The voice stimulus was spatialized at 11 static azimuth positions on the horizontal plane using three different head-related transfer functions (HRTFs) measured on individuals who did not participate in this study. The diotic presentation of the voice stimulus served as the control condition. The results showed that two of the HRTFS performed similarly and had significantly lower auditory speech thresholds than the third HRTF. All three HRTFs yielded significantly lower auditory speech thresholds compared with the diotic presentation of the voice stimulus, with the largest difference at 60 degrees azimuth. The practical implications of these results suggest that lower headphone levels of the communication system in military aircraft can be achieved without sacrificing intelligibility, thereby lessening the risk of hearing loss.
Voice and Fluency Changes as a Function of Speech Task and Deep Brain Stimulation

ERIC Educational Resources Information Center

Van Lancker Sidtis, Diana; Rogers, Tiffany; Godier, Violette; Tagliati, Michele; Sidtis, John J.

2010-01-01

Purpose: Speaking, which naturally occurs in different modes or "tasks" such as conversation and repetition, relies on intact basal ganglia nuclei. Recent studies suggest that voice and fluency parameters are differentially affected by speech task. In this study, the authors examine the effects of subcortical functionality on voice and fluency,…
Remote Capture of Human Voice Acoustical Data by Telephone: A Methods Study

ERIC Educational Resources Information Center

Cannizzaro, Michael S.; Reilly, Nicole; Mundt, James C.; Snyder, Peter J.

2005-01-01

In this pilot study we sought to determine the reliability and validity of collecting speech and voice acoustical data via telephone transmission for possible future use in large clinical trials. Simultaneous recordings of each participant's speech and voice were made at the point of participation, the local recording (LR), and over a telephone…
Age-Related Changes to Spectral Voice Characteristics Affect Judgments of Prosodic, Segmental, and Talker Attributes for Child and Adult Speech

ERIC Educational Resources Information Center

Dilley, Laura C.; Wieland, Elizabeth A.; Gamache, Jessica L.; McAuley, J. Devin; Redford, Melissa A.

2013-01-01

Purpose: As children mature, changes in voice spectral characteristics co-vary with changes in speech, language, and behavior. In this study, spectral characteristics were manipulated to alter the perceived ages of talkers' voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were…
Speech therapy and voice recognition instrument

NASA Technical Reports Server (NTRS)

Cohen, J.; Babcock, M. L.

1972-01-01

Characteristics of electronic circuit for examining variations in vocal excitation for diagnostic purposes and in speech recognition for determiniog voice patterns and pitch changes are described. Operation of the circuit is discussed and circuit diagram is provided.
Devices for hearing loss

MedlinePlus

... NIDCD). Assistive devices for people with hearing, voice, speech, or language disorders. Nidcd.nih.gov Web site. www.nidcd.nih.gov/health/assistive-devices-people-hearing-voice-speech-or-language-disorders . Updated March 6, 2017. Accessed July 5, 2017. ...

Voiced Excitations

DTIC Science & Technology

2004-12-01

3701 North Fairfax Drive Arlington, VA 22203-1714 NA NA NA Radar & EM Speech, Voiced Speech Excitations 61 ULUNCLASSIFIED UNCLASSIFIED UNCLASSIFIED...New Ideas for Speech Recognition and Related Technologies”, Lawrence Livermore National Laboratory Report, UCRL -UR-120310 , 1995 . Available from...Livermore Laboratory report UCRL -JC-134775M Holzrichter 2003, Holzrichter J.F., Kobler, J. B., Rosowski, J.J., Burke, G.J., (2003) “EM wave
Noise on, Voicing off: Speech Perception Deficits in Children with Specific Language Impairment

ERIC Educational Resources Information Center

Ziegler, Johannes C.; Pech-Georgel, Catherine; George, Florence; Lorenzi, Christian

2011-01-01

Speech perception of four phonetic categories (voicing, place, manner, and nasality) was investigated in children with specific language impairment (SLI) (n=20) and age-matched controls (n=19) in quiet and various noise conditions using an AXB two-alternative forced-choice paradigm. Children with SLI exhibited robust speech perception deficits in…
Speech masking and cancelling and voice obscuration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, John F.

A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.
Processing of speech signals for physical and sensory disabilities.

PubMed Central

Levitt, H

1995-01-01

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities. Images Fig. 4 PMID:7479816
Processing of Speech Signals for Physical and Sensory Disabilities

NASA Astrophysics Data System (ADS)

Levitt, Harry

1995-10-01

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.
Multitalker Speech Perception with Ideal Time-Frequency Segregation: Effects of Voice Characteristics and Number of Talkers

DTIC Science & Technology

2009-03-23

Multitalker speech perception with ideal time-frequency segregation: Effects of voice characteristics and number of talkers Douglas S. Brungarta Air...INTRODUCTION Speech perception in multitalker listening environments is limited by two very different types of masking. The first is energetic...06 MAR 2009 2. REPORT TYPE 3. DATES COVERED 00-00-2009 to 00-00-2009 4. TITLE AND SUBTITLE Multitalker speech perception with ideal time
Did you or I say pretty, rude or brief? An ERP study of the effects of speaker's identity on emotional word processing.

PubMed

Pinheiro, Ana P; Rezaii, Neguine; Nestor, Paul G; Rauber, Andréia; Spencer, Kevin M; Niznikiewicz, Margaret

2016-02-01

During speech comprehension, multiple cues need to be integrated at a millisecond speed, including semantic information, as well as voice identity and affect cues. A processing advantage has been demonstrated for self-related stimuli when compared with non-self stimuli, and for emotional relative to neutral stimuli. However, very few studies investigated self-other speech discrimination and, in particular, how emotional valence and voice identity interactively modulate speech processing. In the present study we probed how the processing of words' semantic valence is modulated by speaker's identity (self vs. non-self voice). Sixteen healthy subjects listened to 420 prerecorded adjectives differing in voice identity (self vs. non-self) and semantic valence (neutral, positive and negative), while electroencephalographic data were recorded. Participants were instructed to decide whether the speech they heard was their own (self-speech condition), someone else's (non-self speech), or if they were unsure. The ERP results demonstrated interactive effects of speaker's identity and emotional valence on both early (N1, P2) and late (Late Positive Potential - LPP) processing stages: compared with non-self speech, self-speech with neutral valence elicited more negative N1 amplitude, self-speech with positive valence elicited more positive P2 amplitude, and self-speech with both positive and negative valence elicited more positive LPP. ERP differences between self and non-self speech occurred in spite of similar accuracy in the recognition of both types of stimuli. Together, these findings suggest that emotion and speaker's identity interact during speech processing, in line with observations of partially dependent processing of speech and speaker information. Copyright © 2016. Published by Elsevier Inc.
Bilingual Voicing: A Study of Code-Switching in the Reported Speech of Finnish Immigrants in Estonia

ERIC Educational Resources Information Center

Frick, Maria; Riionheimo, Helka

2013-01-01

Through a conversation analytic investigation of Finnish-Estonian bilingual (direct) reported speech (i.e., voicing) by Finns who live in Estonia, this study shows how code-switching is used as a double contextualization device. The code-switched voicings are shaped by the on-going interactional situation, serving its needs by opening up a context…
Five-year speech and language outcomes in children with cleft lip-palate.

PubMed

Prathanee, Benjamas; Pumnum, Tawitree; Seepuaham, Cholada; Jaiyong, Pechcharat

2016-10-01

To investigate 5-year speech and language outcomes in children with cleft lip/palate (CLP). Thirty-eight children aged 4-7 years and 8 months were recruited for this study. Speech abilities including articulation, resonance, voice, and intelligibility were assessed based on Thai Universal Parameters of Speech Outcomes. Language ability was assessed by the Language Screening Test. The findings revealed that children with clefts had speech and language delay, abnormal understandability, resonance abnormality, and voice disturbance; articulation defects that were 8.33 (1.75, 22.47), 50.00 (32.92, 67.08), 36.11 (20.82, 53.78), 30.56 (16.35, 48.11), and 94.44 (81.34, 99.32). Articulation errors were the most common speech and language defects in children with clefts, followed by abnormal understandability, resonance abnormality, and voice disturbance. These results should be of critical concern. Protocol reviewing and early intervention programs are needed for improved speech outcomes. Copyright © 2016 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Implementation of the Intelligent Voice System for Kazakh

NASA Astrophysics Data System (ADS)

Yessenbayev, Zh; Saparkhojayev, N.; Tibeyev, T.

2014-04-01

Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.
Reference-free automatic quality assessment of tracheoesophageal speech.

PubMed

Huang, Andy; Falk, Tiago H; Chan, Wai-Yip; Parsa, Vijay; Doyle, Philip

2009-01-01

Evaluation of the quality of tracheoesophageal (TE) speech using machines instead of human experts can enhance the voice rehabilitation process for patients who have undergone total laryngectomy and voice restoration. Towards the goal of devising a reference-free TE speech quality estimation algorithm, we investigate the efficacy of speech signal features that are used in standard telephone-speech quality assessment algorithms, in conjunction with a recently introduced speech modulation spectrum measure. Tests performed on two TE speech databases demonstrate that the modulation spectral measure and a subset of features in the standard ITU-T P.563 algorithm estimate TE speech quality with better correlation (up to 0.9) than previously proposed features.
Scientific bases of human-machine communication by voice.

PubMed Central

Schafer, R W

1995-01-01

The scientific bases for human-machine communication by voice are in the fields of psychology, linguistics, acoustics, signal processing, computer science, and integrated circuit technology. The purpose of this paper is to highlight the basic scientific and technological issues in human-machine communication by voice and to point out areas of future research opportunity. The discussion is organized around the following major issues in implementing human-machine voice communication systems: (i) hardware/software implementation of the system, (ii) speech synthesis for voice output, (iii) speech recognition and understanding for voice input, and (iv) usability factors related to how humans interact with machines. PMID:7479802
Patient-reported symptom questionnaires in laryngeal cancer: voice, speech and swallowing.

PubMed

Rinkel, R N P M; Verdonck-de Leeuw, I M; van den Brakel, N; de Bree, R; Eerenstein, S E J; Aaronson, N; Leemans, C R

2014-08-01

To validate questionnaires on voice, speech, and swallowing among laryngeal cancer patients, to assess the need for and use of rehabilitation services, and to determine the association between voice, speech, and swallowing problems, and quality of life and distress. Laryngeal cancer patients at least three months post-treatment completed the VHI (voice), SHI (speech), SWAL-QOL (swallowing), EORTC QLQ-C30, QLQ-HN35, HADS, and study-specific questions on rehabilitation. Eighty-eight patients and 110 healthy controls participated. Cut off scores of 15, 6, and 14 were defined for the VHI, SHI, and SWAL-QOL (sensitivity > 90%; specificity > 80%). Based on these scores, 56% of the patients reported voice, 63% speech, and 54% swallowing problems. VHI, SHI, and SWAL-QOL scores were associated significantly with quality of life (EORTC QLQ-C30 global quality of life scale) (r = .43 (VHI and SHI) and r = .46 (SWAL-QOL)) and distress (r = .50 (VHI and SHI) and r = .58 (SWAL-QOL)). In retrospect, 32% of the patients indicated the need for rehabilitation at time of treatment, and 81% of these patients availed themselves of such services. Post-treatment, 8% of the patients expressed a need for rehabilitation, and 20% of these patients actually made use of such services. Psychometric characteristics of the VHI, SHI, and SWAL-QOL in laryngeal cancer patients are good. The prevalence of voice, speech, and swallowing problems is high, and clearly related to quality of life and distress. Although higher during than after treatment, the perceived need for and use of rehabilitation services is limited. Copyright © 2014 Elsevier Ltd. All rights reserved.
Behavior Assessment Battery: A Pilot Study of the Affective, Behavioral, and Cognitive Correlates Surrounding Spasmodic Dysphonia.

PubMed

Vanryckeghem, Martine; Hoffman Ruddy, Bari; Lehman, Jeffrey

2016-01-01

This study investigates if adults with adductor spasmodic dysphonia (ADSD) report to experience anxiety and voice problems in particular situations, indicate the presence of negative speech-associated attitude, and/or the use of coping behaviors, by means of the Behavior Assessment Battery (BAB) modified for voice. Thirty-two participants with ADSD and 32 adults without a voice disorder participated in this study. Each person completed four different BAB-Voice subtests. These standardized self-report tests are adaptations of the original BAB for people who stutter and explore an individual's speech-related belief, negative emotional reaction to and speech problems in particular speech situations, and the use of concomitant behaviors. Individuals with spasmodic dysphonia (SD) scored statistically significantly higher compared to typical speakers on all BAB subtests, indicating that individuals with SD report being significantly more anxious and experiencing significantly more voice problems in particular speech circumstances. They also reported a significant amount of negative speech-associated attitude and the use of a significant number of coping behaviors. Internal reliability was good for three of the four BAB subtests. The BAB is capable of reflecting the dimensions that surround the disorder of SD. The self-report measures have the potential to augment the observations made by the clinician and may lead to a more diverse and all-encompassing therapy for the person suffering from SD. Future research with a revised version of the BAB-Voice will continue to explore the validity, reliability, and replicability of the initial data. Published by Elsevier Inc.
Common cues to emotion in the dynamic facial expressions of speech and song.

PubMed

Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline

2015-01-01

Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.
Age-related changes to spectral voice characteristics affect judgments of prosodic, segmental, and talker attributes for child and adult speech

PubMed Central

Dilley, Laura C.; Wieland, Elizabeth A.; Gamache, Jessica L.; McAuley, J. Devin; Redford, Melissa A.

2013-01-01

Purpose As children mature, changes in voice spectral characteristics covary with changes in speech, language, and behavior. Spectral characteristics were manipulated to alter the perceived ages of talkers’ voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Method Speech was modified by lowering formants and fundamental frequency, for 5-year-old children’s utterances, or raising them, for adult caregivers’ utterances. Next, participants differing in awareness of the manipulation (Exp. 1a) or amount of speech-language training (Exp. 1b) made judgments of prosodic, segmental, and talker attributes. Exp. 2 investigated the effects of spectral modification on intelligibility. Finally, in Exp. 3 trained analysts used formal prosody coding to assess prosodic characteristics of spectrally-modified and unmodified speech. Results Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Conclusions Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work. PMID:23275414
The Hypothesis of Apraxia of Speech in Children with Autism Spectrum Disorder

PubMed Central

Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.

2010-01-01

In a sample of 46 children aged 4 to 7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants’ speech, prosody, and voice were compared with data from 40 typically-developing children, 13 preschool children with Speech Delay, and 15 participants aged 5 to 49 years with CAS in neurogenetic disorders. Speech Delay and Speech Errors, respectively, were modestly and substantially more prevalent in participants with ASD than reported population estimates. Double dissociations in speech, prosody, and voice impairments in ASD were interpreted as consistent with a speech attunement framework, rather than with the motor speech impairments that define CAS. Key Words: apraxia, dyspraxia, motor speech disorder, speech sound disorder PMID:20972615
A posteriori error estimates in voice source recovery

NASA Astrophysics Data System (ADS)

Leonov, A. S.; Sorokin, V. N.

2017-12-01

The inverse problem of voice source pulse recovery from a segment of a speech signal is under consideration. A special mathematical model is used for the solution that relates these quantities. A variational method of solving inverse problem of voice source recovery for a new parametric class of sources, that is for piecewise-linear sources (PWL-sources), is proposed. Also, a technique for a posteriori numerical error estimation for obtained solutions is presented. A computer study of the adequacy of adopted speech production model with PWL-sources is performed in solving the inverse problems for various types of voice signals, as well as corresponding study of a posteriori error estimates. Numerical experiments for speech signals show satisfactory properties of proposed a posteriori error estimates, which represent the upper bounds of possible errors in solving the inverse problem. The estimate of the most probable error in determining the source-pulse shapes is about 7-8% for the investigated speech material. It is noted that a posteriori error estimates can be used as a criterion of the quality for obtained voice source pulses in application to speaker recognition.
Can a computer-generated voice be sincere? A case study combining music and synthetic speech.

PubMed

Barker, Paul; Newell, Christopher; Newell, George

2013-10-01

This article explores enhancing sincerity, honesty, or truthfulness in computer-generated synthetic speech by accompanying it with music. Sincerity is important if we are to respond positively to any voice, whether human or artificial. What is sincerity in the artificial disembodied voice? Studies in musical expression and performance may illuminate aspects of the 'musically spoken' or sung voice in rendering deeper levels of expression that may include sincerity. We consider one response to this notion in an especially composed melodrama (music accompanying a (synthetic) spoken voice) designed to convey sincerity.
Effects of Voice Coding and Speech Rate on a Synthetic Speech Display in a Telephone Information System

DTIC Science & Technology

1988-05-01

Seeciv Limited- System for varying Senses term filter capacity output until some Figure 2. Original limited-capacity channel model (Frim Broadbent, 1958) S...2 Figure 2. Original limited-capacity channel model (From Broadbent, 1958) .... 10 Figure 3. Experimental...unlimited variety of human voices for digital recording sources. Synthesis by Analysis Analysis-synthesis methods electronically model the human voice

The prevalence of stuttering, voice, and speech-sound disorders in primary school students in Australia.

PubMed

McKinnon, David H; McLeod, Sharynne; Reilly, Sheena

2007-01-01

The aims of this study were threefold: to report teachers' estimates of the prevalence of speech disorders (specifically, stuttering, voice, and speech-sound disorders); to consider correspondence between the prevalence of speech disorders and gender, grade level, and socioeconomic status; and to describe the level of support provided to schoolchildren with speech disorders. Students with speech disorders were identified from 10,425 students in Australia using a 4-stage process: training in the data collection process, teacher identification, confirmation by a speech-language pathologist, and consultation with district special needs advisors. The prevalence of students with speech disorders was estimated; specifically, 0.33% of students were identified as stuttering, 0.12% as having a voice disorder, and 1.06% as having a speech-sound disorder. There was a higher prevalence of speech disorders in males than in females. As grade level increased, the prevalence of speech disorders decreased. There was no significant difference in the pattern of prevalence across the three speech disorders and four socioeconomic groups; however, students who were identified with a speech disorder were more likely to be in the higher socioeconomic groups. Finally, there was a difference between the perceived and actual level of support that was provided to these students. These prevalence figures are lower than those using initial identification by speech-language pathologists and similar to those using parent report.
Speech waveform perturbation analysis: a perceptual-acoustical comparison of seven measures.

PubMed

Askenfelt, A G; Hammarberg, B

1986-03-01

The performance of seven acoustic measures of cycle-to-cycle variations (perturbations) in the speech waveform was compared. All measures were calculated automatically and applied on running speech. Three of the measures refer to the frequency of occurrence and severity of waveform perturbations in special selected parts of the speech, identified by means of the rate of change in the fundamental frequency. Three other measures refer to statistical properties of the distribution of the relative frequency differences between adjacent pitch periods. One perturbation measure refers to the percentage of consecutive pitch period differences with alternating signs. The acoustic measures were tested on tape recorded speech samples from 41 voice patients, before and after successful therapy. Scattergrams of acoustic waveform perturbation data versus an average of perceived deviant voice qualities, as rated by voice clinicians, are presented. The perturbation measures were compared with regard to the acoustic-perceptual correlation and their ability to discriminate between normal and pathological voice status. The standard deviation of the distribution of the relative frequency differences was suggested as the most useful acoustic measure of waveform perturbations for clinical applications.
Vocal Age Disguise: The Role of Fundamental Frequency and Speech Rate and Its Perceived Effects

PubMed Central

Skoog Waller, Sara; Eriksson, Mårten

2016-01-01

The relationship between vocal characteristics and perceived age is of interest in various contexts, as is the possibility to affect age perception through vocal manipulation. A few examples of such situations are when age is staged by actors, when ear witnesses make age assessments based on vocal cues only or when offenders (e.g., online groomers) disguise their voice to appear younger or older. This paper investigates how speakers spontaneously manipulate two age related vocal characteristics (f0 and speech rate) in attempt to sound younger versus older than their true age, and if the manipulations correspond to actual age related changes in f0 and speech rate (Study 1). Further aims of the paper is to determine how successful vocal age disguise is by asking listeners to estimate the age of generated speech samples (Study 2) and to examine whether or not listeners use f0 and speech rate as cues to perceived age. In Study 1, participants from three age groups (20–25, 40–45, and 60–65 years) agreed to read a short text under three voice conditions. There were 12 speakers in each age group (six women and six men). They used their natural voice in one condition, attempted to sound 20 years younger in another and 20 years older in a third condition. In Study 2, 60 participants (listeners) listened to speech samples from the three voice conditions in Study 1 and estimated the speakers’ age. Each listener was exposed to all three voice conditions. The results from Study 1 indicated that the speakers increased fundamental frequency (f0) and speech rate when attempting to sound younger and decreased f0 and speech rate when attempting to sound older. Study 2 showed that the voice manipulations had an effect in the sought-after direction, although the achieved mean effect was only 3 years, which is far less than the intended effect of 20 years. Moreover, listeners used speech rate, but not f0, as a cue to speaker age. It was concluded that age disguise by voice can be achieved by naïve speakers even though the perceived effect was smaller than intended. PMID:27917144
When infants talk, infants listen: pre-babbling infants prefer listening to speech with infant vocal properties.

PubMed

Masapollo, Matthew; Polka, Linda; Ménard, Lucie

2016-03-01

To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre-babbling infants (at 4-6 months) prefer listening to vowel sounds with infant vocal properties over vowel sounds with adult vocal properties. A listening preference favoring infant vowels may derive from their higher voice pitch, which has been shown to attract infant attention in infant-directed speech (IDS). In addition, infants' nascent articulatory abilities may induce a bias favoring infant speech given that 4- to 6-month-olds are beginning to produce vowel sounds. We created infant and adult /i/ ('ee') vowels using a production-based synthesizer that simulates the act of speaking in talkers at different ages and then tested infants across four experiments using a sequential preferential listening task. The findings provide the first evidence that infants preferentially attend to vowel sounds with infant voice pitch and/or formants over vowel sounds with no infant-like vocal properties, supporting the view that infants' production abilities influence how they process infant speech. The findings with respect to voice pitch also reveal parallels between IDS and infant speech, raising new questions about the role of this speech register in infant development. Research exploring the underpinnings and impact of this perceptual bias can expand our understanding of infant language development. © 2015 John Wiley & Sons Ltd.
Accelerometer-based automatic voice onset detection in speech mapping with navigated repetitive transcranial magnetic stimulation.

PubMed

Vitikainen, Anne-Mari; Mäkelä, Elina; Lioumis, Pantelis; Jousmäki, Veikko; Mäkelä, Jyrki P

2015-09-30

The use of navigated repetitive transcranial magnetic stimulation (rTMS) in mapping of speech-related brain areas has recently shown to be useful in preoperative workflow of epilepsy and tumor patients. However, substantial inter- and intraobserver variability and non-optimal replicability of the rTMS results have been reported, and a need for additional development of the methodology is recognized. In TMS motor cortex mappings the evoked responses can be quantitatively monitored by electromyographic recordings; however, no such easily available setup exists for speech mappings. We present an accelerometer-based setup for detection of vocalization-related larynx vibrations combined with an automatic routine for voice onset detection for rTMS speech mapping applying naming. The results produced by the automatic routine were compared with the manually reviewed video-recordings. The new method was applied in the routine navigated rTMS speech mapping for 12 consecutive patients during preoperative workup for epilepsy or tumor surgery. The automatic routine correctly detected 96% of the voice onsets, resulting in 96% sensitivity and 71% specificity. Majority (63%) of the misdetections were related to visible throat movements, extra voices before the response, or delayed naming of the previous stimuli. The no-response errors were correctly detected in 88% of events. The proposed setup for automatic detection of voice onsets provides quantitative additional data for analysis of the rTMS-induced speech response modifications. The objectively defined speech response latencies increase the repeatability, reliability and stratification of the rTMS results. Copyright © 2015 Elsevier B.V. All rights reserved.
Prototype app for voice therapy: a peer review.

PubMed

Lavaissiéri, Paula; Melo, Paulo Eduardo Damasceno

2017-03-09

Voice speech therapy promotes changes in patients' voice-related habits and rehabilitation. Speech-language therapists use a host of materials ranging from pictures to electronic resources and computer tools as aids in this process. Mobile technology is attractive, interactive and a nearly constant feature in the daily routine of a large part of the population and has a growing application in healthcare. To develop a prototype application for voice therapy, submit it to peer assessment, and to improve the initial prototype based on these assessments. a prototype of the Q-Voz application was developed based on Apple's Human Interface Guidelines. The prototype was analyzed by seven speech therapists who work in the voice area. Improvements to the product were made based on these assessments. all features of the application were considered satisfactory by most evaluators. All evaluators found the application very useful; evaluators reported that patients would find it easier to make changes in voice behavior with the application than without it; the evaluators stated they would use this application with their patients with dysphonia and in the process of rehabilitation and that the application offers useful tools for voice self-management. Based on the suggestions provided, six improvements were made to the prototype. the prototype Q-Voz Application was developed and evaluated by seven judges and subsequently improved. All evaluators stated they would use the application with their patients undergoing rehabilitation, indicating that the Q-Voz Application for mobile devices can be considered an auxiliary tool for voice speech therapy.
Automatic initial and final segmentation in cleft palate speech of Mandarin speakers

PubMed Central

Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang

2017-01-01

The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with “quasi-unvoiced” or with “quasi-voiced” initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%. PMID:28926572
Investigation of habitual pitch during free play activities for preschool-aged children.

PubMed

Chen, Yang; Kimelman, Mikael D Z; Micco, Katie

2009-01-01

This study is designed to compare the habitual pitch measured in two different speech activities (free play activity and traditionally used structured speech activity) for normally developing preschool-aged children to explore to what extent preschoolers vary their vocal pitch among different speech environments. Habitual pitch measurements were conducted for 10 normally developing children (2 boys, 8 girls) between the ages of 31 months and 71 months during two different activities: (1) free play; and (2) structured speech. Speech samples were recorded using a throat microphone connected with a wireless transmitter in both activities. The habitual pitch (in Hz) was measured for all collected speech samples by using voice analysis software (Real-Time Pitch). Significantly higher habitual pitch is found during free play in contrast to structured speech activities. In addition, there is no showing of significant difference of habitual pitch elicited across a variety of structured speech activities. Findings suggest that the vocal usage of preschoolers appears to be more effortful during free play than during structured activities. It is recommended that a comprehensive evaluation for young children's voice needs to be based on the speech/voice samples collected from both free play and structured activities.
Pathological speech signal analysis and classification using empirical mode decomposition.

PubMed

Kaleem, Muhammad; Ghoraani, Behnaz; Guergachi, Aziz; Krishnan, Sridhar

2013-07-01

Automated classification of normal and pathological speech signals can provide an objective and accurate mechanism for pathological speech diagnosis, and is an active area of research. A large part of this research is based on analysis of acoustic measures extracted from sustained vowels. However, sustained vowels do not reflect real-world attributes of voice as effectively as continuous speech, which can take into account important attributes of speech such as rapid voice onset and termination, changes in voice frequency and amplitude, and sudden discontinuities in speech. This paper presents a methodology based on empirical mode decomposition (EMD) for classification of continuous normal and pathological speech signals obtained from a well-known database. EMD is used to decompose randomly chosen portions of speech signals into intrinsic mode functions, which are then analyzed to extract meaningful temporal and spectral features, including true instantaneous features which can capture discriminative information in signals hidden at local time-scales. A total of six features are extracted, and a linear classifier is used with the feature vector to classify continuous speech portions obtained from a database consisting of 51 normal and 161 pathological speakers. A classification accuracy of 95.7 % is obtained, thus demonstrating the effectiveness of the methodology.
Speech coding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ravishankar, C., Hughes Network Systems, Germantown, MD

Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfullymore » regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the coding techniques are equally applicable to any voice signal whether or not it carries any intelligible information, as the term speech implies. Other terms that are commonly used are speech compression and voice compression since the fundamental idea behind speech coding is to reduce (compress) the transmission rate (or equivalently the bandwidth) And/or reduce storage requirements In this document the terms speech and voice shall be used interchangeably.« less
Vibrato in Singing Voice: The Link between Source-Filter and Sinusoidal Models

NASA Astrophysics Data System (ADS)

Arroabarren, Ixone; Carlosena, Alfonso

2004-12-01

The application of inverse filtering techniques for high-quality singing voice analysis/synthesis is discussed. In the context of source-filter models, inverse filtering provides a noninvasive method to extract the voice source, and thus to study voice quality. Although this approach is widely used in speech synthesis, this is not the case in singing voice. Several studies have proved that inverse filtering techniques fail in the case of singing voice, the reasons being unclear. In order to shed light on this problem, we will consider here an additional feature of singing voice, not present in speech: the vibrato. Vibrato has been traditionally studied by sinusoidal modeling. As an alternative, we will introduce here a novel noninteractive source filter model that incorporates the mechanisms of vibrato generation. This model will also allow the comparison of the results produced by inverse filtering techniques and by sinusoidal modeling, as they apply to singing voice and not to speech. In this way, the limitations of these conventional techniques, described in previous literature, will be explained. Both synthetic signals and singer recordings are used to validate and compare the techniques presented in the paper.
[Voice and vibration sensations in the speech forming organs: clinical and theoretical aspects of rare symptoms specific for schizophrenia].

PubMed

Vilela, W; Lolas, F; Wolpert, E

1978-01-01

When studying 750 psychiatric in-patients with psychoses of various diagnostic groups, the symptoms of voice sensations and vibration feelings could only be found among patients with paranoid schizophrenia. In addition, these symptoms were located exclusively in body areas that are involved in the peripheral motor production of voice and speech (areas of head, throat, thorax). In 11 of 15 such cases that could be identified, the sensations of voices and vibrations occurred simultaneously and in identical body parts; in the remaining 4 cases only voices without vibration sensations were reported. Therefore these symptoms can be considered as highly specific for schizophrenia. According to the terminology of Bleuler these two symptoms are because of their rareness to be taken as accessoric symptoms; according to the terminology of Kurt Schneider they have the value of first rank symptoms because of their highly diagnostic specifity for schizophrenia. The pathogenesis of these symptoms is on the one hand discussed under the perspective of language development and the changing function of language for behaviour control; on the other hand, the pathogenesis of these symptoms is discussed from the viewpoint of cybernetic, or neurophysiological-neuroanatomical foundation of speech production and speech control. Both models of explanation have in common that the ideational component of speech is noticed as acustic halluzinations and the motor proprioceptive part of speech is noticed as sensation of vibrations, both in a typically schiphrenic manner, i.e. dissociated and ego-alienated.
Vocal effectiveness of speech-language pathology students: Before and after voice use during service delivery.

PubMed

Couch, Stephanie; Zieba, Dominique; Van der Linde, Jeannie; Van der Merwe, Anita

2015-03-26

As a professional voice user, it is imperative that a speech-language pathologist's(SLP) vocal effectiveness remain consistent throughout the day. Many factors may contribute to reduced vocal effectiveness, including prolonged voice use, vocally abusive behaviours,poor vocal hygiene and environmental factors. To determine the effect of service delivery on the perceptual and acoustic features of voice. A quasi-experimental., pre-test-post-test research design was used. Participants included third- and final-year speech-language pathology students at the University of Pretoria(South Africa). Voice parameters were evaluated in a pre-test measurement, after which the participants provided two consecutive hours of therapy. A post-test measurement was then completed. Data analysis consisted of an instrumental analysis in which the multidimensional voice programme (MDVP) and the voice range profile (VRP) were used to measure vocal parameters and then calculate the dysphonia severity index (DSI). The GRBASI scale was used to conduct a perceptual analysis of voice quality. Data were processed using descriptive statistics to determine change in each measured parameter after service delivery. A change of clinical significance was observed in the acoustic and perceptual parameters of voice. Guidelines for SLPs in order to maintain optimal vocal effectiveness were suggested.
Aspects of the speaking voice of elderly women with choral singing experience.

PubMed

Aquino, Fernanda Salvatico de; Silva, Marta Assumpção Andrada E; Teles, Lídia Cristina da Silva; Ferreira, Léslie Piccolotto

2016-01-01

Despite several studies related to singing and aging voice found in the literature, there is still the need for investigation seeking to understand the effects of this practice in the speaking voice of the elderly. To compare the characteristics of the speaking voice of elderlies with experience in choral singing with those of elderlies without this experience. Participants were 75 elderly women: 50 with experience in choral singing - group of singers (SG) and 25 without experience - group of nonsingers (NSG). A questionnaire was applied to characterize the elderly and collect data with respect to lifestyle and voice. Speech samples (sustained vowels, repetition of sentences, and running speech excerpts) were collected in a quiet room in sitting position. The voices were analyzed by three expert speech-language pathologists according to the protocol Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). Data were submitted to descriptive and statistical analysis. The voices of elderly nonsingers (NSG) showed significant increase in scores related to the overall degree of deviance and presence of roughness and strain. Analysis of the aspects of the speaking voice of subjects in the SG, compared with that of subjects in the NSG, showed better overall degree of deviance due to lower roughness and strain.
Vocal effectiveness of speech-language pathology students: Before and after voice use during service delivery

PubMed Central

Couch, Stephanie; Zieba, Dominique; van der Merwe, Anita

2015-01-01

Background As a professional voice user, it is imperative that a speech-language pathologist's (SLP) vocal effectiveness remain consistent throughout the day. Many factors may contribute to reduced vocal effectiveness, including prolonged voice use, vocally abusive behaviours, poor vocal hygiene and environmental factors. Objectives To determine the effect of service delivery on the perceptual and acoustic features of voice. Method A quasi-experimental., pre-test–post-test research design was used. Participants included third- and final-year speech-language pathology students at the University of Pretoria (South Africa). Voice parameters were evaluated in a pre-test measurement, after which the participants provided two consecutive hours of therapy. A post-test measurement was then completed. Data analysis consisted of an instrumental analysis in which the multidimensional voice programme (MDVP) and the voice range profile (VRP) were used to measure vocal parameters and then calculate the dysphonia severity index (DSI). The GRBASI scale was used to conduct a perceptual analysis of voice quality. Data were processed using descriptive statistics to determine change in each measured parameter after service delivery. Results A change of clinical significance was observed in the acoustic and perceptual parameters of voice. Conclusion Guidelines for SLPs in order to maintain optimal vocal effectiveness were suggested. PMID:26304213
Acoustic characteristics of voice after severe traumatic brain injury.

PubMed

McHenry, M

2000-07-01

To describe the acoustic characteristics of voice in individuals with motor speech disorders after traumatic brain injury (TBI). Prospective study of 100 individuals with TBI based on consecutive referrals for motor speech evaluations. Subjects were audio tape-recorded while producing sustained vowels and single word and sentence intelligibility tests. Laryngeal airway resistance was estimated, and voice quality was rated perceptually. None of the subjects evidenced vocal parameters within normal limits. The most frequently occurring abnormal parameter across subjects was amplitude perturbation, followed by voice turbulence index. Twenty-three percent of subjects evidenced deviation in all five parameters measured. The perceptual ratings of breathiness were significantly correlated with both the amplitude perturbation quotient and the noise-to-harmonics ratio. Vocal quality deviation is common in motor speech disorders after TBI and may impact intelligibility.
Influence of voice focus on tongue movement in speech.

PubMed

Bressmann, Tim; de Boer, Gillian; Marino, Viviane Cristina de Castro; Fabron, Eliana Maria Gradim; Berti, Larissa Cristina

2017-01-01

The present study evaluated global aspects of lingual movement during sentence production with backward and forward voice focus. Nine female participants read a sentence with a variety of consonants in a normal condition and with backward and forward voice focus. Midsagittal tongue movement was recorded with ultrasound. Tongue height over time at an anterior, a central, and a posterior measurement angle was measured. The outcome measures were speech rate, cumulative distance travelled, and average movement speed of the tongue. There were no differences in speech rate between the different conditions. The cumulative distance travelled by the tongue and the average speed indicated that the posterior tongue travelled a smaller cumulative distance and at a slower speed in the forward focus condition. The central tongue moved a larger cumulative distance and at a higher speed in the backward focus condition. The study offers first insights on how tongue movement is affected by different voice focus settings and illustrates the plasticity of tongue movement in speech.
Low Vocal Pitch Preference Drives First Impressions Irrespective of Context in Male Voices but Not in Female Voices.

PubMed

Tsantani, Maria S; Belin, Pascal; Paterson, Helena M; McAleer, Phil

2016-08-01

Vocal pitch has been found to influence judgments of perceived trustworthiness and dominance from a novel voice. However, the majority of findings arise from using only male voices and in context-specific scenarios. In two experiments, we first explore the influence of average vocal pitch on first-impression judgments of perceived trustworthiness and dominance, before establishing the existence of an overall preference for high or low pitch across genders. In Experiment 1, pairs of high- and low-pitched temporally reversed recordings of male and female vocal utterances were presented in a two-alternative forced-choice task. Results revealed a tendency to select the low-pitched voice over the high-pitched voice as more trustworthy, for both genders, and more dominant, for male voices only. Experiment 2 tested an overall preference for low-pitched voices, and whether judgments were modulated by speech content, using forward and reversed speech to manipulate context. Results revealed an overall preference for low pitch, irrespective of direction of speech, in male voices only. No such overall preference was found for female voices. We propose that an overall preference for low pitch is a default prior in male voices irrespective of context, whereas pitch preferences in female voices are more context- and situation-dependent. The present study confirms the important role of vocal pitch in the formation of first-impression personality judgments and advances understanding of the impact of context on pitch preferences across genders.
Speech technology and cinema: can they learn from each other?

PubMed

Pauletto, Sandra

2013-10-01

The voice is the most important sound of a film soundtrack. It represents a character and it carries language. There are different types of cinematic voices: dialogue, internal monologues, and voice-overs. Conventionally, two main characteristics differentiate these voices: lip synchronization and the voice's attributes that make it appropriate for the character (for example, a voice that sounds very close to the audience can be appropriate for a narrator, but not for an onscreen character). What happens, then, if a film character can only speak through an asynchronous machine that produces a 'robot-like' voice? This article discusses the sound-related work and experimentation done by the author for the short film Voice by Choice. It also attempts to discover whether speech technology design can learn from its cinematic representation, and if such uncommon film protagonists can contribute creatively to transform the conventions of cinematic voices.
Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus

PubMed Central

2017-01-01

Cortex in and around the human posterior superior temporal sulcus (pSTS) is known to be critical for speech perception. The pSTS responds to both the visual modality (especially biological motion) and the auditory modality (especially human voices). Using fMRI in single subjects with no spatial smoothing, we show that visual and auditory selectivity are linked. Regions of the pSTS were identified that preferred visually presented moving mouths (presented in isolation or as part of a whole face) or moving eyes. Mouth-preferring regions responded strongly to voices and showed a significant preference for vocal compared with nonvocal sounds. In contrast, eye-preferring regions did not respond to either vocal or nonvocal sounds. The converse was also true: regions of the pSTS that showed a significant response to speech or preferred vocal to nonvocal sounds responded more strongly to visually presented mouths than eyes. These findings can be explained by environmental statistics. In natural environments, humans see visual mouth movements at the same time as they hear voices, while there is no auditory accompaniment to visual eye movements. The strength of a voxel's preference for visual mouth movements was strongly correlated with the magnitude of its auditory speech response and its preference for vocal sounds, suggesting that visual and auditory speech features are coded together in small populations of neurons within the pSTS. SIGNIFICANCE STATEMENT Humans interacting face to face make use of auditory cues from the talker's voice and visual cues from the talker's mouth to understand speech. The human posterior superior temporal sulcus (pSTS), a brain region known to be important for speech perception, is complex, with some regions responding to specific visual stimuli and others to specific auditory stimuli. Using BOLD fMRI, we show that the natural statistics of human speech, in which voices co-occur with mouth movements, are reflected in the neural architecture of the pSTS. Different pSTS regions prefer visually presented faces containing either a moving mouth or moving eyes, but only mouth-preferring regions respond strongly to voices. PMID:28179553

Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus.

PubMed

Zhu, Lin L; Beauchamp, Michael S

2017-03-08

Cortex in and around the human posterior superior temporal sulcus (pSTS) is known to be critical for speech perception. The pSTS responds to both the visual modality (especially biological motion) and the auditory modality (especially human voices). Using fMRI in single subjects with no spatial smoothing, we show that visual and auditory selectivity are linked. Regions of the pSTS were identified that preferred visually presented moving mouths (presented in isolation or as part of a whole face) or moving eyes. Mouth-preferring regions responded strongly to voices and showed a significant preference for vocal compared with nonvocal sounds. In contrast, eye-preferring regions did not respond to either vocal or nonvocal sounds. The converse was also true: regions of the pSTS that showed a significant response to speech or preferred vocal to nonvocal sounds responded more strongly to visually presented mouths than eyes. These findings can be explained by environmental statistics. In natural environments, humans see visual mouth movements at the same time as they hear voices, while there is no auditory accompaniment to visual eye movements. The strength of a voxel's preference for visual mouth movements was strongly correlated with the magnitude of its auditory speech response and its preference for vocal sounds, suggesting that visual and auditory speech features are coded together in small populations of neurons within the pSTS. SIGNIFICANCE STATEMENT Humans interacting face to face make use of auditory cues from the talker's voice and visual cues from the talker's mouth to understand speech. The human posterior superior temporal sulcus (pSTS), a brain region known to be important for speech perception, is complex, with some regions responding to specific visual stimuli and others to specific auditory stimuli. Using BOLD fMRI, we show that the natural statistics of human speech, in which voices co-occur with mouth movements, are reflected in the neural architecture of the pSTS. Different pSTS regions prefer visually presented faces containing either a moving mouth or moving eyes, but only mouth-preferring regions respond strongly to voices. Copyright © 2017 the authors 0270-6474/17/372697-12$15.00/0.
A perspective on early commercial applications of voice-processing technology for telecommunications and aids for the handicapped.

PubMed Central

Seelbach, C

1995-01-01

The Colloquium on Human-Machine Communication by Voice highlighted the global technical community's focus on the problems and promise of voice-processing technology, particularly, speech recognition and speech synthesis. Clearly, there are many areas in both the research and development of these technologies that can be advanced significantly. However, it is also true that there are many applications of these technologies that are capable of commercialization now. Early successful commercialization of new technology is vital to ensure continuing interest in its development. This paper addresses efforts to commercialize speech technologies in two markets: telecommunications and aids for the handicapped. PMID:7479814
Behavioral treatments for speech in Parkinson's disease: meta-analyses and review of the literature.

PubMed

Atkinson-Clement, Cyril; Sadat, Jasmin; Pinto, Serge

2015-01-01

Parkinson's disease (PD) results from neurodegenerative processes leading to alteration of motor functions. Most motor symptoms respond well to pharmacological and neurosurgical treatments, except some axial symptoms such as speech impairment, so-called dysarthria. However, speech therapy is rarely proposed to PD patients. This review aims at evaluating previous research on the effects of speech behavioral therapies in patients with PD. We also performed two meta-analyses focusing on speech loudness and voice pitch. We showed that intensive therapies in PD are the most effective for hypophonia and can lead to some improvement of voice pitch. Although speech therapy is effective in handling PD dysarthria, behavioral speech rehabilitation in PD still needs further validation.
Voice Acoustical Measurement of the Severity of Major Depression

ERIC Educational Resources Information Center

Cannizzaro, Michael; Harel, Brian; Reilly, Nicole; Chappell, Phillip; Snyder, Peter J.

2004-01-01

A number of empirical studies have documented the relationship between quantifiable and objective acoustical measures of voice and speech, and clinical subjective ratings of severity of Major Depression. To further explore this relationship, speech samples were extracted from videotape recordings of structured interviews made during the…
Voice Therapy Techniques Adapted to Treatment of Habit Cough: A Pilot Study.

ERIC Educational Resources Information Center

Blager, Florence B.; And Others

1988-01-01

Individuals with long-standing habit cough having no organic basis can be successfully treated with a combination of psychotherapy and speech therapy. Techniques for speech therapy are adapted from those used with hyperfunctional voice disorders to fit this debilitating laryngeal disorder. (Author)
A study of voice production characteristics of astronuat speech during Apollo 11 for speaker modeling in space.

PubMed

Yu, Chengzhu; Hansen, John H L

2017-03-01

Human physiology has evolved to accommodate environmental conditions, including temperature, pressure, and air chemistry unique to Earth. However, the environment in space varies significantly compared to that on Earth and, therefore, variability is expected in astronauts' speech production mechanism. In this study, the variations of astronaut voice characteristics during the NASA Apollo 11 mission are analyzed. Specifically, acoustical features such as fundamental frequency and phoneme formant structure that are closely related to the speech production system are studied. For a further understanding of astronauts' vocal tract spectrum variation in space, a maximum likelihood frequency warping based analysis is proposed to detect the vocal tract spectrum displacement during space conditions. The results from fundamental frequency, formant structure, as well as vocal spectrum displacement indicate that astronauts change their speech production mechanism when in space. Moreover, the experimental results for astronaut voice identification tasks indicate that current speaker recognition solutions are highly vulnerable to astronaut voice production variations in space conditions. Future recommendations from this study suggest that successful applications of speaker recognition during extended space missions require robust speaker modeling techniques that could effectively adapt to voice production variation caused by diverse space conditions.
Voice Quality Modelling for Expressive Speech Synthesis

PubMed Central

Socoró, Joan Claudi

2014-01-01

This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F 0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics. PMID:24587738
Predicting Voice Disorder Status From Smoothed Measures of Cepstral Peak Prominence Using Praat and Analysis of Dysphonia in Speech and Voice (ADSV).

PubMed

Sauder, Cara; Bretl, Michelle; Eadie, Tanya

2017-09-01

The purposes of this study were to (1) determine and compare the diagnostic accuracy of a single acoustic measure, smoothed cepstral peak prominence (CPPS), to predict voice disorder status from connected speech samples using two software systems: Analysis of Dysphonia in Speech and Voice (ADSV) and Praat; and (2) to determine the relationship between measures of CPPS generated from these programs. This is a retrospective cross-sectional study. Measures of CPPS were obtained from connected speech recordings of 100 subjects with voice disorders and 70 nondysphonic subjects without vocal complaints using commercially available ADSV and freely downloadable Praat software programs. Logistic regression and receiver operating characteristic (ROC) analyses were used to evaluate and compare the diagnostic accuracy of CPPS measures. Relationships between CPPS measures from the programs were determined. Results showed acceptable overall accuracy rates (75% accuracy, ADSV; 82% accuracy, Praat) and area under the ROC curves (area under the curve [AUC] = 0.81, ADSV; AUC = 0.91, Praat) for predicting voice disorder status, with slight differences in sensitivity and specificity. CPPS measures derived from Praat were uniquely predictive of disorder status above and beyond CPPS measures from ADSV (χ 2 (1) = 40.71, P < 0.001). CPPS measures from both programs were significantly and highly correlated (r = 0.88, P < 0.001). A single acoustic measure of CPPS was highly predictive of voice disorder status using either program. Clinicians may consider using CPPS to complement clinical voice evaluation and screening protocols. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Vocal Control: Is It Susceptible to the Negative Effects of Self-Regulatory Depletion?

PubMed

Vinney, Lisa A; van Mersbergen, Miriam; Connor, Nadine P; Turkstra, Lyn S

2016-09-01

Self-regulation (SR) relies on the capacity to modify behavior. This capacity may diminish with use and result in self-regulatory depletion (SRD), or the reduced ability to engage in future SR efforts. If the SRD effect applies to vocal behavior, it may hinder success during behavioral voice treatment. Thus, this proof-of-concept study sought to determine whether SRD affects vocal behavior change and if so, whether it can be repaired by an intervention meant to replete SR resources. One hundred four women without voice disorders were randomized into groups that performed either (1) a high-SR writing task followed by a high-SR voice task; (2) a low-SR writing task followed by a high-SR voice task; or (3) a high-SR writing task followed by a relaxation intervention and a high-SR voice task. The high-SR voice tasks in all groups involved suppression of the Lombard effect during reading and free speech. The low-SR group suppressed the Lombard effect to a greater extent than the high-SR group and high-SR-plus-relaxation group on the free speech task. There were no significant group differences on the reading task. Findings suggest that SRD may present challenges to vocal behavior modification during free speech but not reading. Furthermore, relaxation did not significantly replete self-regulatory resources for vocal modification during free speech. Findings may highlight potential considerations for voice treatment and assessment and support the need for future research focusing on effective methods to test self-regulatory capacity and replete self-regulatory resources in voice patients. Published by Elsevier Inc.
[Acoustic voice analysis using the Praat program: comparative study with the Dr. Speech program].

PubMed

Núñez Batalla, Faustino; González Márquez, Rocío; Peláez González, M Belén; González Laborda, Irene; Fernández Fernández, María; Morato Galán, Marta

2014-01-01

The European Laryngological Society (ELS) basic protocol for functional assessment of voice pathology includes 5 different approaches: perception, videostroboscopy, acoustics, aerodynamics and subjective rating by the patient. In this study we focused on acoustic voice analysis. The purpose of the present study was to correlate the results obtained by the commercial software Dr. Speech and the free software Praat in 2 fields: 1. Narrow-band spectrogram (the presence of noise according to Yanagihara, and the presence of subharmonics) (semi-quantitative). 2. Voice acoustic parameters (jitter, shimmer, harmonics-to-noise ratio, fundamental frequency) (quantitative). We studied a total of 99 voice samples from individuals with Reinke's oedema diagnosed using videostroboscopy. One independent observer used Dr. Speech 3.0 and a second one used the Praat program (Phonetic Sciences, University of Amsterdam). The spectrographic analysis consisted of obtaining a narrow-band spectrogram from the previous digitalised voice samples by the 2 independent observers. They then determined the presence of noise in the spectrogram, using the Yanagihara grades, as well as the presence of subharmonics. As a final result, the acoustic parameters of jitter, shimmer, harmonics-to-noise ratio and fundamental frequency were obtained from the 2 acoustic analysis programs. The results indicated that the sound spectrogram and the numerical values obtained for shimmer and jitter were similar for both computer programs, even though types 1, 2 and 3 voice samples were analysed. The Praat and Dr. Speech programs provide similar results in the acoustic analysis of pathological voices. Copyright © 2013 Elsevier España, S.L. All rights reserved.
Postlingual adult performance in noise with HiRes 120 and ClearVoice Low, Medium, and High.

PubMed

Holden, Laura K; Brenner, Christine; Reeder, Ruth M; Firszt, Jill B

2013-11-01

The study's objectives were to evaluate speech recognition in multiple listening conditions using several noise types with HiRes 120 and ClearVoice (Low, Medium, High) and to determine which ClearVoice program was most beneficial for everyday use. Fifteen postlingual adults attended four sessions; speech recognition was assessed at sessions 1 and 3 with HiRes 120 and at sessions 2 and 4 with all ClearVoice programs. Test measures included sentences presented in restaurant noise (R-SPACE), in speech-spectrum noise, in four- and eight-talker babble, and connected discourse presented in 12-talker babble. Participants completed a questionnaire comparing ClearVoice programs. Significant group differences in performance between HiRes 120 and ClearVoice were present only in the R-SPACE; performance was better with ClearVoice High than HiRes 120. Among ClearVoice programs, no significant group differences were present for any measure. Individual results revealed most participants performed better in the R-SPACE with ClearVoice than HiRes 120. For other measures, significant individual differences between HiRes 120 and ClearVoice were not prevalent. Individual results among ClearVoice programs differed and overall preferences varied. Questionnaire data indicated increased understanding with High and Medium in certain environments. R-SPACE and questionnaire results indicated an advantage for ClearVoice High and Medium. Individual test and preference data showed mixed results between ClearVoice programs making global recommendations difficult; however, results suggest providing ClearVoice High and Medium and HiRes 120 as processor options for adults willing to change settings. For adults unwilling or unable to change settings, ClearVoice Medium is a practical choice for daily listening.
Voice parameters and videonasolaryngoscopy in children with vocal nodules: a longitudinal study, before and after voice therapy.

PubMed

Valadez, Victor; Ysunza, Antonio; Ocharan-Hernandez, Esther; Garrido-Bustamante, Norma; Sanchez-Valerio, Araceli; Pamplona, Ma C

2012-09-01

Vocal Nodules (VN) are a functional voice disorder associated with voice misuse and abuse in children. There are few reports addressing vocal parameters in children with VN, especially after a period of vocal rehabilitation. The purpose of this study is to describe measurements of vocal parameters including Fundamental Frequency (FF), Shimmer (S), and Jitter (J), videonasolaryngoscopy examination and clinical perceptual assessment, before and after voice therapy in children with VN. Voice therapy was provided using visual support through Speech-Viewer software. Twenty patients with VN were studied. An acoustical analysis of voice was performed and compared with data from subjects from a control group matched by age and gender. Also, clinical perceptual assessment of voice and videonasolaryngoscopy were performed to all patients with VN. After a period of voice therapy, provided with visual support using Speech Viewer-III (SV-III-IBM) software, new acoustical analyses, perceptual assessments and videonasolaryngoscopies were performed. Before the onset of voice therapy, there was a significant difference (p<0.05) in mean FF, S and J, between the patients with VN and subjects from the control group. After the voice therapy period, a significant improvement (p<0.05) was found in all acoustic voice parameters. Moreover, perceptual voice analysis demonstrated improvement in all cases. Finally, videonasolaryngoscopy demonstrated that vocal nodules were no longer discernible on the vocal folds in any of the cases. SV-III software seems to be a safe and reliable method for providing voice therapy in children with VN. Acoustic voice parameters, perceptual data and videonasolaryngoscopy were significantly improved after the speech therapy period was completed. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Applications of orofacial myofunctional techniques to speech therapy.

PubMed

Landis, C F

1994-11-01

A speech-language pathologist describes how she uses oral myofunctional therapy techniques in the treatment of speech articulation disorders, voice disorders, stuttering and apraxia of speech. Specific exercises are detailed.
Voice Technologies in Libraries: A Look into the Future.

ERIC Educational Resources Information Center

Lange, Holley R., Ed.; And Others

1991-01-01

Discussion of synthesized speech and voice recognition focuses on a forum that addressed the potential for speech technologies in libraries. Topics discussed by three contributors include possible library applications in technical processing, book receipt, circulation control, and database access; use by disabled and illiterate users; and problems…
Allophonic Variation in the Spanish Sibilant Fricative

ERIC Educational Resources Information Center

Garcia, Alison

2013-01-01

In Spanish, the phoneme /s/ has two variants: [z] occurs in the coda when preceding a voiced consonant, and [s] occurs elsewhere. However, recent research has revealed irregular voicing patterns with regards to this phone. This dissertation examines two of these allophonic variations. It first investigates how speech rate and speech formality…
Clinical Characteristics of Voice, Speech, and Swallowing Disorders in Oromandibular Dystonia

ERIC Educational Resources Information Center

Kreisler, Alexandre; Vepraet, Anne Caroline; Veit, Solène; Pennel-Ployart, Odile; Béhal, Hélène; Duhamel, Alain; Destée, Alain

2016-01-01

Purpose: To better define the clinical characteristics of idiopathic oromandibular dystonia, we studied voice, speech, and swallowing disorders and their impact on activities of daily living. Method: Fourteen consecutive patients with idiopathic oromandibular dystonia and 14 matched, healthy control subjects were included in the study. Results:…
Voice Preprocessor for Digital Voice Applications

DTIC Science & Technology

1989-09-11

helit tralnsforniers are marketed for use kll ith multitonle MI( LMS and are acceptable for w ice appl ica- tiotns. 3. Automatic Gain (iontro: A...variations of speech spectral tilt to improve the quaiit\\ of the extracted speech parameters. Nlore imnportantly, the onlN analoii circuit "e use is a
Changes of some functional speech disorders after surgical correction of skeletal anterior open bite.

PubMed

Knez Ambrožič, Mojca; Hočevar Boltežar, Irena; Ihan Hren, Nataša

2015-09-01

Skeletal anterior open bite (AOB) or apertognathism is characterized by the absence of contact of the anterior teeth and affects articulation parameters, chewing, biting and voice quality. The treatment of AOB consists of orthognatic surgical procedures. The aim of this study was to evaluate the effects of treatment on voice quality, articulation and nasality in speech with respect to skeletal changes. The study was prospective; 15 patients with AOB were evaluated before and after surgery. Lateral cephalometric x-ray parameters (facial angle, interincisal distance, Wits appraisal) were measured to determine skeletal changes. Before surgery, nine patients still had articulation disorders despite speech therapy during childhood. The voice quality parameters were determined by acoustic analysis of the vowel sound /a/ (fundamental frequency-F0, jitter, shimmer). Spectral analysis of vowels /a/, /e/, /i/, /o/, /u/ was carried out by determining the mean frequency of the first (F1) and second (F2) formants. Nasality in speech was expressed as the ratio between the nasal and the oral sound energies during speech samples. After surgery, normalizations of facial skeletal parameters were observed in all patients, but no statistically significant changes in articulation and voice quality parameters occurred despite subjective observations of easier articulation. Any deterioration in velopharyngeal insufficiency was absent in all of the patients. In conclusion, the surgical treatment of skeletal AOB does not lead to deterioration in voice, resonance and articulation qualities. Despite surgical correction of the unfavourable skeletal situation of the speech apparatus, the pre-existing articulation disorder cannot improve without professional intervention.
Unilateral Vocal Fold Paralysis: A Systematic Review of Speech-Language Pathology Management.

PubMed

Walton, Chloe; Conway, Erin; Blackshaw, Helen; Carding, Paul

2017-07-01

Dysphonia due to unilateral vocal fold paralysis (UVFP) can be characterized by hoarseness and weakness, resulting in a significant impact on patients' activity and participation. Voice therapy provided by a speech-language pathologist is designed to maximize vocal function and improve quality of life. The purpose of this paper is to systematically review literature surrounding the effectiveness of speech-language pathology intervention for the management of UVFP in adults. This is a systematic review. Electronic databases were searched using a range of key terms including dysphonia, vocal fold paralysis, and speech-language pathology. Eligible articles were extracted and reviewed by the authors for risk of bias, methodology, treatment efficacy, and clinical outcomes. Of the 3311 articles identified, 12 met the inclusion criteria: seven case series and five comparative studies. All 12 studies subjectively reported positive effects following the implementation of voice therapy for UVFP; however, the heterogeneity of participant characteristics, voice therapy, and voice outcome resulted in a low level of evidence. There is presently a lack of methodological rigor and clinical efficacy in the speech-language pathology management of dysphonia arising from UVFP in adults. Reasons for this reduced efficacy can be attributed to the following: (1) no standardized speech-language pathology intervention; (2) no consistency of assessment battery; (3) the variable etiology and clinical presentation of UVFP; and (4) inconsistent timing, frequency, and intensity of treatment. Further research is required to develop the evidence for the management of UVFP incorporating controlled treatment protocols and more rigorous clinical methodology. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
A 4.8 kbps code-excited linear predictive coder

NASA Technical Reports Server (NTRS)

Tremain, Thomas E.; Campbell, Joseph P., Jr.; Welch, Vanoy C.

1988-01-01

A secure voice system STU-3 capable of providing end-to-end secure voice communications (1984) was developed. The terminal for the new system will be built around the standard LPC-10 voice processor algorithm. The performance of the present STU-3 processor is considered to be good, its response to nonspeech sounds such as whistles, coughs and impulse-like noises may not be completely acceptable. Speech in noisy environments also causes problems with the LPC-10 voice algorithm. In addition, there is always a demand for something better. It is hoped that LPC-10's 2.4 kbps voice performance will be complemented with a very high quality speech coder operating at a higher data rate. This new coder is one of a number of candidate algorithms being considered for an upgraded version of the STU-3 in late 1989. The problems of designing a code-excited linear predictive (CELP) coder to provide very high quality speech at a 4.8 kbps data rate that can be implemented on today's hardware are considered.

Changes in objective acoustic measurements and subjective voice complaints in call center customer-service advisors during one working day.

PubMed

Lehto, Laura; Laaksonen, Laura; Vilkman, Erkki; Alku, Paavo

2008-03-01

The aim of this study was to investigate how different acoustic parameters, extracted both from speech pressure waveforms and glottal flows, can be used in measuring vocal loading in modern working environments and how these parameters reflect the possible changes in the vocal function during a working day. In addition, correlations between objective acoustic parameters and subjective voice symptoms were addressed. The subjects were 24 female and 8 male customer-service advisors, who mainly use telephone during their working hours. Speech samples were recorded from continuous speech four times during a working day and voice symptom questionnaires were completed simultaneously. Among the various objective parameters, only F0 resulted in a statistically significant increase for both genders. No correlations between the changes in objective and subjective parameters appeared. However, the results encourage researchers within the field of occupational voice use to apply versatile measurement techniques in studying occupational voice loading.
The Effects of Language Experience and Speech Context on the Phonetic Accommodation of English-accented Spanish Voicing.

PubMed

Llanos, Fernando; Francis, Alexander L

2017-03-01

Native speakers of Spanish with different amounts of experience with English classified stop-consonant voicing (/b/ versus /p/) across different speech accents: English-accented Spanish, native Spanish, and native English. While listeners with little experience with English classified target voicing with an English- or Spanish-like voice onset time (VOT) boundary, predicted by contextual VOT, listeners familiar with English relied on an English-like VOT boundary in an English-accented Spanish context even in the absence of clear contextual cues to English VOT. This indicates that Spanish listeners accommodated English-accented Spanish voicing differently depending on their degree of familiarization with the English norm.
Describing Speech Usage in Daily Activities in Typical Adults.

PubMed

Anderson, Laine; Baylor, Carolyn R; Eadie, Tanya L; Yorkston, Kathryn M

2016-01-01

"Speech usage" refers to what people want or need to do with their speech to meet communication demands in life roles. The purpose of this study was to contribute to validation of the Levels of Speech Usage scale by providing descriptive data from a sample of adults without communication disorders, comparing this scale to a published Occupational Voice Demands scale and examining predictors of speech usage levels. This is a survey design. Adults aged ≥25 years without reported communication disorders were recruited nationally to complete an online questionnaire. The questionnaire included the Levels of Speech Usage scale, questions about relevant occupational and nonoccupational activities (eg, socializing, hobbies, childcare, and so forth), and demographic information. Participants were also categorized according to Koufman and Isaacson occupational voice demands scale. A total of 276 participants completed the questionnaires. People who worked for pay tended to report higher levels of speech usage than those who do not work for pay. Regression analyses showed employment to be the major contributor to speech usage; however, considerable variance left unaccounted for suggests that determinants of speech usage and the relationship between speech usage, employment, and other life activities are not yet fully defined. The Levels of Speech Usage may be a viable instrument to systematically rate speech usage because it captures both occupational and nonoccupational speech demands. These data from a sample of typical adults may provide a reference to help in interpreting the impact of communication disorders on speech usage patterns. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Speech and swallowing disorders in Parkinson disease.

PubMed

Sapir, Shimon; Ramig, Lorraine; Fox, Cynthia

2008-06-01

To review recent research and clinical studies pertaining to the nature, diagnosis, and treatment of speech and swallowing disorders in Parkinson disease. Although some studies indicate improvement in voice and speech with dopamine therapy and deep brain stimulation of the subthalamic nucleus, others show minimal or adverse effects. Repetitive transcranial magnetic stimulation of the mouth motor cortex and injection of collagen in the vocal folds have preliminary data supporting improvement in phonation in people with Parkinson disease. Treatments focusing on vocal loudness, specifically LSVT LOUD (Lee Silverman Voice Treatment), have been effective for the treatment of speech disorders in Parkinson disease. Changes in brain activity due to LSVT LOUD provide preliminary evidence for neural plasticity. Computer-based technology makes the Lee Silverman Voice Treatment available to a large number of users. A rat model for studying neuropharmacologic effects on vocalization in Parkinson disease has been developed. New diagnostic methods of speech and swallowing are also available as the result of recent studies. Speech rehabilitation with the LSVT LOUD is highly efficacious and scientifically tested. There is a need for more studies to improve understanding, diagnosis, prevention, and treatment of speech and swallowing disorders in Parkinson disease.
Speech transport for packet telephony and voice over IP

NASA Astrophysics Data System (ADS)

Baker, Maurice R.

1999-11-01

Recent advances in packet switching, internetworking, and digital signal processing technologies have converged to allow realizable practical implementations of packet telephony systems. This paper provides a tutorial on transmission engineering for packet telephony covering the topics of speech coding/decoding, speech packetization, packet data network transport, and impairments which may negatively impact end-to-end system quality. Particular emphasis is placed upon Voice over Internet Protocol given the current popularity and ubiquity of IP transport.
Speech assessment of patients using three types of indwelling tracheo-oesophageal voice prostheses.

PubMed

Heaton, J M; Sanderson, D; Dunsmore, I R; Parker, A J

1996-04-01

A multidisciplinary prospective study compared speech acceptability between three types of indwelling tracheo-oesophageal voice prostheses. Twenty male laryngectomees took part over five years, using 42 prostheses. Speech was assessed on a discrete scale by trained and untrained personnel. The majority scored in the mid-range for each assessor. The kappa coefficient was used to test similarity between assessors, and for all pairings agreement was significant (p < 0.05). The speech and language therapist tended to give higher scores and the patient lower. A relationship was found between patients' ages categorized by decade and the surgeon's score alone. This relationship held for Groningen high resistance and Provox prostheses individually too (p < 0.05). The untrained assessed similarly to the professionals--humans are all voice listeners. The analysis suggests surgeons find tracheo-oesophageal speech in older patients better than in younger ones; or make more allowances for the elderly. There was a trend for Provox prostheses to produce the best scores.
Fluid Dynamics of Human Phonation and Speech

NASA Astrophysics Data System (ADS)

Mittal, Rajat; Erath, Byron D.; Plesniak, Michael W.

2013-01-01

This article presents a review of the fluid dynamics, flow-structure interactions, and acoustics associated with human phonation and speech. Our voice is produced through the process of phonation in the larynx, and an improved understanding of the underlying physics of this process is essential to advancing the treatment of voice disorders. Insights into the physics of phonation and speech can also contribute to improved vocal training and the development of new speech compression and synthesis schemes. This article introduces the key biomechanical features of the laryngeal physiology, reviews the basic principles of voice production, and summarizes the progress made over the past half-century in understanding the flow physics of phonation and speech. Laryngeal pathologies, which significantly enhance the complexity of phonatory dynamics, are discussed. After a thorough examination of the state of the art in computational modeling and experimental investigations of phonatory biomechanics, we present a synopsis of the pacing issues in this arena and an outlook for research in this fascinating subject.
Hearing history influences voice gender perceptual performance in cochlear implant users.

PubMed

Kovačić, Damir; Balaban, Evan

2010-12-01

The study was carried out to assess the role that five hearing history variables (chronological age, age at onset of deafness, age of first cochlear implant [CI] activation, duration of CI use, and duration of known deafness) play in the ability of CI users to identify speaker gender. Forty-one juvenile CI users participated in two voice gender identification tasks. In a fixed, single-interval task, subjects listened to a single speech item from one of 20 adult male or 20 adult female speakers and had to identify speaker gender. In an adaptive speech-based voice gender discrimination task with the fundamental frequency difference between the voices as the adaptive parameter, subjects listened to a pair of speech items presented in sequential order, one of which was always spoken by an adult female and the other by an adult male. Subjects had to identify the speech item spoken by the female voice. Correlation and regression analyses between perceptual scores in the two tasks and the hearing history variables were performed. Subjects fell into three performance groups: (1) those who could distinguish voice gender in both tasks, (2) those who could distinguish voice gender in the adaptive but not the fixed task, and (3) those who could not distinguish voice gender in either task. Gender identification performance for single voices in the fixed task was significantly and negatively related to the duration of deafness before cochlear implantation (shorter deafness yielded better performance), whereas performance in the adaptive task was weakly but significantly related to age at first activation of the CI device, with earlier activations yielding better scores. The existence of a group of subjects able to perform adaptive discrimination but unable to identify the gender of singly presented voices demonstrates the potential dissociability of the skills required for these two tasks, suggesting that duration of deafness and age of cochlear implantation could have dissociable effects on the development of different skills required by CI users to identify speaker gender.
Voice Interactive Analysis System Study. Final Report, August 28, 1978 through March 23, 1979.

ERIC Educational Resources Information Center

Harry, D. P.; And Others

The Voice Interactive Analysis System study continued research and development of the LISTEN real-time, minicomputer based connected speech recognition system, within NAVTRAEQUIPCEN'S program of developing automatic speech technology in support of training. An attempt was made to identify the most effective features detected by the TTI-500 model…
Tutorial and Guidelines on Measurement of Sound Pressure Level in Voice and Speech

ERIC Educational Resources Information Center

Švec, Jan G.; Granqvist, Svante

2018-01-01

Purpose: Sound pressure level (SPL) measurement of voice and speech is often considered a trivial matter, but the measured levels are often reported incorrectly or incompletely, making them difficult to compare among various studies. This article aims at explaining the fundamental principles behind these measurements and providing guidelines to…
Delivering the Lee Silverman Voice Treatment (LSVT) by Web Camera: A Feasibility Study

ERIC Educational Resources Information Center

Howell, Susan; Tripoliti, Elina; Pring, Tim

2009-01-01

Background: Speech disorders are a feature of Parkinson's disease, typically worsening as the disease progresses. The Lee Silverman Voice Treatment (LSVT) was developed to address these difficulties. It targets vocal loudness as a means of increasing vocal effort and improving coordination across the subsystems of speech. Aims: Currently LSVT is…
Intra- and Inter-database Study for Arabic, English, and German Databases: Do Conventional Speech Features Detect Voice Pathology?

PubMed

Ali, Zulfiqar; Alsulaiman, Mansour; Muhammad, Ghulam; Elamvazuthi, Irraivan; Al-Nasheri, Ahmed; Mesallam, Tamer A; Farahat, Mohamed; Malki, Khalid H

2017-05-01

A large population around the world has voice complications. Various approaches for subjective and objective evaluations have been suggested in the literature. The subjective approach strongly depends on the experience and area of expertise of a clinician, and human error cannot be neglected. On the other hand, the objective or automatic approach is noninvasive. Automatic developed systems can provide complementary information that may be helpful for a clinician in the early screening of a voice disorder. At the same time, automatic systems can be deployed in remote areas where a general practitioner can use them and may refer the patient to a specialist to avoid complications that may be life threatening. Many automatic systems for disorder detection have been developed by applying different types of conventional speech features such as the linear prediction coefficients, linear prediction cepstral coefficients, and Mel-frequency cepstral coefficients (MFCCs). This study aims to ascertain whether conventional speech features detect voice pathology reliably, and whether they can be correlated with voice quality. To investigate this, an automatic detection system based on MFCC was developed, and three different voice disorder databases were used in this study. The experimental results suggest that the accuracy of the MFCC-based system varies from database to database. The detection rate for the intra-database ranges from 72% to 95%, and that for the inter-database is from 47% to 82%. The results conclude that conventional speech features are not correlated with voice, and hence are not reliable in pathology detection. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Speech rate reduction and "nasality" in normal speakers.

PubMed

Brancewicz, T M; Reich, A R

1989-12-01

This study explored the effects of reduced speech rate on nasal/voice accelerometric measures and nasality ratings. Nasal/voice accelerometric measures were obtained from normal adults for various speech stimuli and speaking rates. Stimuli included three sentences (one obstruent-loaded, one semivowel-loaded, and one containing a single nasal), and /pv/ syllable trains.. Speakers read the stimuli at their normal rate, half their normal rate, and as slowly as possible. In addition, a computer program paced each speaker at rates of 1, 2, and 3 syllables per second. The nasal/voice accelerometric values revealed significant stimulus effects but no rate effects. The nasality ratings of experienced listeners, evaluated as a function of stimulus and speaking rate, were compared to the accelerometric measures. The nasality scale values demonstrated small, but statistically significant, stimulus and rate effects. However, the nasality percepts were poorly correlated with the nasal/voice accelerometric measures.
The stop voicing contrast in French: From citation speech to sentencial speech

NASA Astrophysics Data System (ADS)

Abdelli-Beruh, Nassima; Demaio, Eileen; Hisagi, Miwako

2004-05-01

This study explores the influence of speaking style on the salience of the acoustic correlates to the stop voicing distinction in French. Monolingual French speakers produced twenty-one C_vC_ syllables in citation speech, in minimal pairs and in sentence-length utterances (/pa/-/a/ context: /il a di pa C_vC_ a lui/; /pas/-/s/ context: /il a di pas C_vC_ sa~ lui/). Prominent stress was on the C_vC_. Voicing-related differences in percentages of closure voicing, durations of aspiration, closure, and vowel were analyzed as a function of these three speaking styles. Results show that the salience of the acoustic-phonetic segments present when the syllables are uttered in isolation or in minimal pairs is different than when the syllables are spoken in a sentence. These results are in agreement with findings in English.
The recognition of female voice based on voice registers in singing techniques in real-time using hankel transform method and macdonald function

NASA Astrophysics Data System (ADS)

Meiyanti, R.; Subandi, A.; Fuqara, N.; Budiman, M. A.; Siahaan, A. P. U.

2018-03-01

A singer doesn’t just recite the lyrics of a song, but also with the use of particular sound techniques to make it more beautiful. In the singing technique, more female have a diverse sound registers than male. There are so many registers of the human voice, but the voice registers used while singing, among others, Chest Voice, Head Voice, Falsetto, and Vocal fry. Research of speech recognition based on the female’s voice registers in singing technique is built using Borland Delphi 7.0. Speech recognition process performed by the input recorded voice samples and also in real time. Voice input will result in weight energy values based on calculations using Hankel Transformation method and Macdonald Functions. The results showed that the accuracy of the system depends on the accuracy of sound engineering that trained and tested, and obtained an average percentage of the successful introduction of the voice registers record reached 48.75 percent, while the average percentage of the successful introduction of the voice registers in real time to reach 57 percent.
McGurk Effect in Gender Identification: Vision Trumps Audition in Voice Judgments.

PubMed

Peynircioǧlu, Zehra F; Brent, William; Tatz, Joshua R; Wyatt, Jordan

2017-01-01

Demonstrations of non-speech McGurk effects are rare, mostly limited to emotion identification, and sometimes not considered true analogues. We presented videos of males and females singing a single syllable on the same pitch and asked participants to indicate the true range of the voice-soprano, alto, tenor, or bass. For one group of participants, the gender shown on the video matched the gender of the voice heard, and for the other group they were mismatched. Soprano or alto responses were interpreted as "female voice" decisions and tenor or bass responses as "male voice" decisions. Identification of the voice gender was 100% correct in the preceding audio-only condition. However, whereas performance was also 100% correct in the matched video/audio condition, it was only 31% correct in the mismatched video/audio condition. Thus, the visual gender information overrode the voice gender identification, showing a robust non-speech McGurk effect.
Listen to your mother! The role of talker familiarity in infant streaming.

PubMed

Barker, Brittan A; Newman, Rochelle S

2004-12-01

Little is known about the acoustic cues infants might use to selectively attend to one talker in the presence of background noise. This study examined the role of talker familiarity as a possible cue. Infants either heard their own mothers (maternal-voice condition) or a different infant's mother (novel-voice condition) repeating isolated words while a female distracter voice spoke fluently in the background. Subsequently, infants heard passages produced by the target voice containing either the familiarized, target words or novel words. Infants in the maternal-voice condition listened significantly longer to the passages containing familiar words; infants in the novel-voice condition showed no preference. These results suggest that infants are able to separate the simultaneous speech of two women when one of the voices is highly familiar to them. However, infants seem to find separating the simultaneous speech of two unfamiliar women extremely difficult.
Recognition of voice commands using adaptation of foreign language speech recognizer via selection of phonetic transcriptions

NASA Astrophysics Data System (ADS)

Maskeliunas, Rytis; Rudzionis, Vytautas

2011-06-01

In recent years various commercial speech recognizers have become available. These recognizers provide the possibility to develop applications incorporating various speech recognition techniques easily and quickly. All of these commercial recognizers are typically targeted to widely spoken languages having large market potential; however, it may be possible to adapt available commercial recognizers for use in environments where less widely spoken languages are used. Since most commercial recognition engines are closed systems the single avenue for the adaptation is to try set ways for the selection of proper phonetic transcription methods between the two languages. This paper deals with the methods to find the phonetic transcriptions for Lithuanian voice commands to be recognized using English speech engines. The experimental evaluation showed that it is possible to find phonetic transcriptions that will enable the recognition of Lithuanian voice commands with recognition accuracy of over 90%.
Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study.

PubMed

Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola

2015-11-06

Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented.
Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study

PubMed Central

Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola

2015-01-01

Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented. PMID:26561811

Noise on, voicing off: Speech perception deficits in children with specific language impairment.

PubMed

Ziegler, Johannes C; Pech-Georgel, Catherine; George, Florence; Lorenzi, Christian

2011-11-01

Speech perception of four phonetic categories (voicing, place, manner, and nasality) was investigated in children with specific language impairment (SLI) (n=20) and age-matched controls (n=19) in quiet and various noise conditions using an AXB two-alternative forced-choice paradigm. Children with SLI exhibited robust speech perception deficits in silence, stationary noise, and amplitude-modulated noise. Comparable deficits were obtained for fast, intermediate, and slow modulation rates, and this speaks against the various temporal processing accounts of SLI. Children with SLI exhibited normal "masking release" effects (i.e., better performance in fluctuating noise than in stationary noise), again suggesting relatively spared spectral and temporal auditory resolution. In terms of phonetic categories, voicing was more affected than place, manner, or nasality. The specific nature of this voicing deficit is hard to explain with general processing impairments in attention or memory. Finally, speech perception in noise correlated with an oral language component but not with either a memory or IQ component, and it accounted for unique variance beyond IQ and low-level auditory perception. In sum, poor speech perception seems to be one of the primary deficits in children with SLI that might explain poor phonological development, impaired word production, and poor word comprehension. Copyright © 2011 Elsevier Inc. All rights reserved.
Common cues to emotion in the dynamic facial expressions of speech and song

PubMed Central

Livingstone, Steven R.; Thompson, William F.; Wanderley, Marcelo M.; Palmer, Caroline

2015-01-01

Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech–song differences. Vocalists’ jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech–song. Vocalists’ emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists’ facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production. PMID:25424388
Speech in spinocerebellar ataxia.

PubMed

Schalling, Ellika; Hartelius, Lena

2013-12-01

Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.
Effects of cross-language voice training on speech perception: Whose familiar voices are more intelligible?

PubMed Central

Levi, Susannah V.; Winters, Stephen J.; Pisoni, David B.

2011-01-01

Previous research has shown that familiarity with a talker’s voice can improve linguistic processing (herein, “Familiar Talker Advantage”), but this benefit is constrained by the context in which the talker’s voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers’ voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers’ voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers. PMID:22225059
[Design of standard voice sample text for subjective auditory perceptual evaluation of voice disorders].

PubMed

Li, Jin-rang; Sun, Yan-yan; Xu, Wen

2010-09-01

To design a speech voice sample text with all phonemes in Mandarin for subjective auditory perceptual evaluation of voice disorders. The principles for design of a speech voice sample text are: The short text should include the 21 initials and 39 finals, this may cover all the phonemes in Mandarin. Also, the short text should have some meanings. A short text was made out. It had 155 Chinese words, and included 21 initials and 38 finals (the final, ê, was not included because it was rarely used in Mandarin). Also, the text covered 17 light tones and one "Erhua". The constituent ratios of the initials and finals presented in this short text were statistically similar as those in Mandarin according to the method of similarity of the sample and population (r = 0.742, P < 0.001 and r = 0.844, P < 0.001, respectively). The constituent ratios of the tones presented in this short text were statistically not similar as those in Mandarin (r = 0.731, P > 0.05). A speech voice sample text with all phonemes in Mandarin was made out. The constituent ratios of the initials and finals presented in this short text are similar as those in Mandarin. Its value for subjective auditory perceptual evaluation of voice disorders need further study.
A Review of Training Opportunities for Singing Voice Rehabilitation Specialists.

PubMed

Gerhard, Julia

2016-05-01

Training opportunities for singing voice rehabilitation specialists are growing and changing. This is happening despite a lack of agreed-on guidelines or an accredited certification acknowledged by the governing bodies in the fields of speech-language pathology and vocal pedagogy, the American Speech-Language Hearing Association and the National Association of Teachers of Singing, respectively. The roles of the speech-language pathologist, the singing teacher, and the person who bridges this gap, the singing voice rehabilitation specialist, are now becoming better defined and more common among the voice care community. To that end, this article aims to review the current opportunities for training in the field of singing voice rehabilitation. A review of available university training programs, private training programs and mentorships, clinical fellowships, professional organizations, conferences, vocal training across genres, and self-study opportunities was conducted. All institutional listings are with permission from program leaders. Although many avenues are available for training of singing voice rehabilitation specialists, there is no accredited comprehensive training program at this point. This review gathers information on current training opportunities from across various modalities. The listings are not intended to be comprehensive but rather representative of possibilities for interested practitioners. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
The Effect of Background Traffic Packet Size to VoIP Speech Quality

NASA Astrophysics Data System (ADS)

Triyason, Tuul; Kanthamanon, Prasert; Warasup, Kittipong; Yamsaengsung, Siam; Supattatham, Montri

VoIP is gaining acceptance into the corporate world especially, in small and medium sized business that want to save cost for gaining advantage over their competitors. The good voice quality is one of challenging task in deployment plan because VoIP voice quality was affected by packet loss and jitter delay. In this paper, we study the effect of background traffic packet size to voice quality. The background traffic was generated by Bricks software and the speech quality was assessed by MOS. The obtained result shows an interesting relationship between the voice quality and the number of TCP packets and their size. With the same amount of data smaller packets affect the voice's quality more than the larger packet.
Evaluation of synthesized voice approach callouts /SYNCALL/

NASA Technical Reports Server (NTRS)

Simpson, C. A.

1981-01-01

The two basic approaches to the generation of 'synthesized' speech include a utilization of analog recorded human speech and a construction of speech entirely from algorithms applied to constants describing speech sounds. Given the availability of synthesized speech displays for man-machine systems, research is needed to study suggested applications for speech and design principles for speech displays. The present investigation is concerned with a study for which new performance measures were developed. A number of air carrier approach and landing accidents during low or impaired visibility have been associated with the absence of approach callouts. The study had the purpose to compare a pilot-not-flying (PNF) approach callout system to a system composed of PNF callouts augmented by an automatic synthesized voice callout system (SYNCALL). Pilots were found to favor the use of a SYNCALL system containing certain modifications.
[Voice assessment and demographic data of applicants for a school of speech therapists].

PubMed

Reiter, R; Brosch, S

2008-05-01

Demographic data, subjective und objective voice analysis as well as self-assessment of voice quality from applicants for a school of speech therapists were investigated. Demographic data from 116 applicants were collected and their voice quality assessed by three independent judges. An objective evaluation was done by maximum phonation time, average fundamental frequency, dynamic range and percent of jitter and shimmer by means of Goettinger Hoarseness diagram. Self-assessment of voice quality was done by "voice handicap index questionnaire". The twenty successful applicants had a physiological voice in 95 %, they were all musical and had university entrance qualifications. Subjective voice assessment showed in 16 % of the applicants a hoarse voice. In this subgroup an unphysiological vocal use was observed in 72 % and a reduced articulation in 45 %. The objective voice parameters did not show a significant difference between the 3 groups. Self-assessment of the voice was inconspicuous in all applicants. Applicants with general qualification for university entrance, musicality and a physiological voice were more likely to be successful. There were main differences between self assessment of voice and quantitative analysis or subjective assessment by three independent judges.
Sperry Univac speech communications technology

NASA Technical Reports Server (NTRS)

Medress, Mark F.

1977-01-01

Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.
Brainstem Correlates of Speech-in-Noise Perception in Children

PubMed Central

Anderson, Samira; Skoe, Erika; Chandrasekaran, Bharath; Zecker, Steven; Kraus, Nina

2010-01-01

Children often have difficulty understanding speech in challenging listening environments. In the absence of peripheral hearing loss, these speech perception difficulties may arise from dysfunction at more central levels in the auditory system, including subcortical structures. We examined brainstem encoding of pitch in a speech syllable in 38 school-age children. In children with poor speech-in-noise perception, we find impaired encoding of the fundamental frequency and the second harmonic, two important cues for pitch perception. Pitch, an important factor in speaker identification, aids the listener in tracking a specific voice from a background of voices. These results suggest that the robustness of subcortical neural encoding of pitch features in time-varying signals is an important factor in determining success with speech perception in noise. PMID:20708671
Speaking Math--A Voice Input, Speech Output Calculator for Students with Visual Impairments

ERIC Educational Resources Information Center

Bouck, Emily C.; Flanagan, Sara; Joshi, Gauri S.; Sheikh, Waseem; Schleppenbach, Dave

2011-01-01

This project explored a newly developed computer-based voice input, speech output (VISO) calculator. Three high school students with visual impairments educated at a state school for the blind and visually impaired participated in the study. The time they took to complete assessments and the average number of attempts per problem were recorded…
Voice Disorder Management Competencies: A Survey of School-Based Speech-Language Pathologists in Nebraska

ERIC Educational Resources Information Center

Teten, Amy F.; DeVeney, Shari L.; Friehe, Mary J.

2016-01-01

Purpose: The purpose of this survey was to determine the self-perceived competence levels in voice disorders of practicing school-based speech-language pathologists (SLPs) and identify correlated variables. Method: Participants were 153 master's level, school-based SLPs with a Nebraska teaching certificate and/or licensure who completed a survey,…
Children's Recognition of Their Own Recorded Voice: Influence of Age and Phonological Impairment

ERIC Educational Resources Information Center

Strombergsson, Sofia

2013-01-01

Children with phonological impairment (PI) often have difficulties perceiving insufficiencies in their own speech. The use of recordings has been suggested as a way of directing the child's attention toward his/her own speech, despite a lack of evidence that children actually recognize their recorded voice as their own. We present two studies of…
Use of Spectral/Cepstral Analyses for Differentiating Normal from Hypofunctional Voices in Sustained Vowel and Continuous Speech Contexts

ERIC Educational Resources Information Center

Watts, Christopher R.; Awan, Shaheen N.

2011-01-01

Purpose: In this study, the authors evaluated the diagnostic value of spectral/cepstral measures to differentiate dysphonic from nondysphonic voices using sustained vowels and continuous speech samples. Methodology: Thirty-two age- and gender-matched individuals (16 participants with dysphonia and 16 controls) were recorded reading a standard…
Speech-language pathology students' self-reports on voice training: easier to understand or to do?

PubMed

Lindhe, Christina; Hartelius, Lena

2009-01-01

The aim of the study was to describe the subjective ratings of the course 'Training of the student's own voice and speech', from a student-centred perspective. A questionnaire was completed after each of the six individual sessions. Six speech and language pathology (SLP) students rated how they perceived the practical exercises in terms of doing and understanding. The results showed that five of the six participants rated the exercises as significantly easier to understand than to do. The exercises were also rated as easier to do over time. Results are interpreted within in a theoretical framework of approaches to learning. The findings support the importance of both the physical and reflective aspects of the voice training process.
Measurement of voice onset time in maxillectomy patients.

PubMed

Hattori, Mariko; Sumita, Yuka I; Taniguchi, Hisashi

2014-01-01

Objective speech evaluation using acoustic measurement is needed for the proper rehabilitation of maxillectomy patients. For digital evaluation of consonants, measurement of voice onset time is one option. However, voice onset time has not been measured in maxillectomy patients as their consonant sound spectra exhibit unique characteristics that make the measurement of voice onset time challenging. In this study, we established criteria for measuring voice onset time in maxillectomy patients for objective speech evaluation. We examined voice onset time for /ka/ and /ta/ in 13 maxillectomy patients by calculating the number of valid measurements of voice onset time out of three trials for each syllable. Wilcoxon's signed rank test showed that voice onset time measurements were more successful for /ka/ and /ta/ when a prosthesis was used (Z = -2.232, P = 0.026 and Z = -2.401, P = 0.016, resp.) than when a prosthesis was not used. These results indicate a prosthesis affected voice onset measurement in these patients. Although more research in this area is needed, measurement of voice onset time has the potential to be used to evaluate consonant production in maxillectomy patients wearing a prosthesis.
Contributions of speech science to the technology of man-machine voice interactions

NASA Technical Reports Server (NTRS)

Lea, Wayne A.

1977-01-01

Research in speech understanding was reviewed. Plans which include prosodics research, phonological rules for speech understanding systems, and continued interdisciplinary phonetics research are discussed. Improved acoustic phonetic analysis capabilities in speech recognizers are suggested.
A Joint Time-Frequency and Matrix Decomposition Feature Extraction Methodology for Pathological Voice Classification

NASA Astrophysics Data System (ADS)

Ghoraani, Behnaz; Krishnan, Sridhar

2009-12-01

The number of people affected by speech problems is increasing as the modern world places increasing demands on the human voice via mobile telephones, voice recognition software, and interpersonal verbal communications. In this paper, we propose a novel methodology for automatic pattern classification of pathological voices. The main contribution of this paper is extraction of meaningful and unique features using Adaptive time-frequency distribution (TFD) and nonnegative matrix factorization (NMF). We construct Adaptive TFD as an effective signal analysis domain to dynamically track the nonstationarity in the speech and utilize NMF as a matrix decomposition (MD) technique to quantify the constructed TFD. The proposed method extracts meaningful and unique features from the joint TFD of the speech, and automatically identifies and measures the abnormality of the signal. Depending on the abnormality measure of each signal, we classify the signal into normal or pathological. The proposed method is applied on the Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database which consists of 161 pathological and 51 normal speakers, and an overall classification accuracy of 98.6% was achieved.
Speech-Message Extraction from Interference Introduced by External Distributed Sources

NASA Astrophysics Data System (ADS)

Kanakov, V. A.; Mironov, N. A.

2017-08-01

The problem of this study involves the extraction of a speech signal originating from a certain spatial point and calculation of the intelligibility of the extracted voice message. It is solved by the method of decreasing the influence of interference from the speech-message sources on the extracted signal. This method is based on introducing the time delays, which depend on the spatial coordinates, to the recording channels. Audio records of the voices of eight different people were used as test objects during the studies. It is proved that an increase in the number of microphones improves intelligibility of the speech message which is extracted from interference.

Feasibility of automated speech sample collection with stuttering children using interactive voice response (IVR) technology.

PubMed

Vogel, Adam P; Block, Susan; Kefalianos, Elaina; Onslow, Mark; Eadie, Patricia; Barth, Ben; Conway, Laura; Mundt, James C; Reilly, Sheena

2015-04-01

To investigate the feasibility of adopting automated interactive voice response (IVR) technology for remotely capturing standardized speech samples from stuttering children. Participants were 10 6-year-old stuttering children. Their parents called a toll-free number from their homes and were prompted to elicit speech from their children using a standard protocol involving conversation, picture description and games. The automated IVR system was implemented using an off-the-shelf telephony software program and delivered by a standard desktop computer. The software infrastructure utilizes voice over internet protocol. Speech samples were automatically recorded during the calls. Video recordings were simultaneously acquired in the home at the time of the call to evaluate the fidelity of the telephone collected samples. Key outcome measures included syllables spoken, percentage of syllables stuttered and an overall rating of stuttering severity using a 10-point scale. Data revealed a high level of relative reliability in terms of intra-class correlation between the video and telephone acquired samples on all outcome measures during the conversation task. Findings were less consistent for speech samples during picture description and games. Results suggest that IVR technology can be used successfully to automate remote capture of child speech samples.
Speech and Communication Disorders

MedlinePlus

... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...
[The progress in the rehabilitation of dysarthria in Parkinson disease using LSVT (Lee Silverman Voice Treatment)].

PubMed

Kamińska, Ilona; Zebryk-Stopa, Anna; Pruszewicz, Antoni; Dziubalska-Kołaczyk, Katarzyna; Połczyńska-Fiszer, Monika; Pietrala, Dawid; Przedpelska-Ober, Elzbieta

2007-01-01

Parkison's disease causes damage to the central nervous system resulting in bradykinesia, muscle rigidity, rest tremor and dysarthric speech. In clinical terms dysarthria denotes the dysfunction of articulation, phonation and respiration. It is brought about by the impairment of neural paths innervating the speech apparatus, thus causing a decreased ability to communicate. The study was conducted by the Center for Speech and Language Processing (CSLP), Adam Mickiewicz University, Poznań and the Chair and Department of Phoniatrics and Audiology, the Medical University, Poznań within the interdisciplinary research project grant called "Speech and Language Virtual Therapist for Individuals with Parkinson's Disease". Apart from traditional voice and speech therapies, one of the ways of treating speech disturbances accompanying Parkinson's disease is an innovative Lee Silverman Voice Treatment (LSVT). The purpose of this innovative method introduced by dr L. Ramig and colleagues in 1987-1988, is to teach the patient to speak loud. As a result of co-operation between CLSP and the Center for Spoken Language Research (CSLR) at the University of Colorado, Boulder, USA, a Polish version of LSVT Virtual Therapist computer programme was created (LSVTVT). The programme is based on the principles of LSVT. The positive outcomes of the therapy give hope to Parkinson's disease patients with dysarthria, as well as to speech therapists.
Acoustic voice analysis of prelingually deaf adults before and after cochlear implantation.

PubMed

Evans, Maegan K; Deliyski, Dimitar D

2007-11-01

It is widely accepted that many severe to profoundly deaf adults have benefited from cochlear implants (CIs). However, limited research has been conducted to investigate changes in voice and speech of prelingually deaf adults who receive CIs, a population well known for presenting with a variety of voice and speech abnormalities. The purpose of this study was to use acoustic analysis to explore changes in voice and speech for three prelingually deaf males pre- and postimplantation over 6 months. The following measurements, some measured in varying contexts, were obtained: fundamental frequency (F0), jitter, shimmer, noise-to-harmonic ratio, voice turbulence index, soft phonation index, amplitude- and F0-variation, F0-range, speech rate, nasalance, and vowel production. Characteristics of vowel production were measured by determining the first formant (F1) and second formant (F2) of vowels in various contexts, magnitude of F2-variation, and rate of F2-variation. Perceptual measurements of pitch, pitch variability, loudness variability, speech rate, and intonation were obtained for comparison. Results are reported using descriptive statistics. The results showed patterns of change for some of the parameters while there was considerable variation across the subjects. All participants demonstrated a decrease in F0 in at least one context and demonstrated a change in nasalance toward the norm as compared to their normal hearing control. The two participants who were oral-language communicators were judged to produce vowels with an average of 97.2% accuracy and the sign-language user demonstrated low percent accuracy for vowel production.
Military and Government Applications of Human-Machine Communication by Voice

NASA Astrophysics Data System (ADS)

Weinstein, Clifford J.

1995-10-01

This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier's computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs.
Young Autistic Children's Listening Preferences in Regard to Speech: A Possible Characterization of the Symptom of Social Withdrawal.

ERIC Educational Resources Information Center

Klin, Ami

1991-01-01

Twelve autistic children (ages 4-6) were given a choice between their mothers' speech and the noise of superimposed voices. In contrast to comparison groups of mentally retarded and normally developing children, the autistic children actively preferred the superimposed voices or showed a lack of preference for either audio segment. (Author/JDD)
Multidimensional assessment of strongly irregular voices such as in substitution voicing and spasmodic dysphonia: a compilation of own research.

PubMed

Moerman, Mieke; Martens, Jean-Pierre; Dejonckere, Philippe

2015-04-01

This article is a compilation of own research performed during the European COoperation in Science and Technology (COST) action 2103: 'Advance Voice Function Assessment', an initiative of voice and speech processing teams consisting of physicists, engineers, and clinicians. This manuscript concerns analyzing largely irregular voicing types, namely substitution voicing (SV) and adductor spasmodic dysphonia (AdSD). A specific perceptual rating scale (IINFVo) was developed, and the Auditory Model Based Pitch Extractor (AMPEX), a piece of software that automatically analyses running speech and generates pitch values in background noise, was applied. The IINFVo perceptual rating scale has been shown to be useful in evaluating SV. The analysis of strongly irregular voices stimulated a modification of the European Laryngological Society's assessment protocol which was originally designed for the common types of (less severe) dysphonia. Acoustic analysis with AMPEX demonstrates that the most informative features are, for SV, the voicing-related acoustic features and, for AdSD, the perturbation measures. Poor correlations between self-assessment and acoustic and perceptual dimensions in the assessment of highly irregular voices argue for a multidimensional approach.
Infants in cocktail parties

NASA Astrophysics Data System (ADS)

Newman, Rochelle S.

2003-04-01

Most work on listeners' ability to separate streams of speech has focused on adults. Yet infants also find themselves in noisy environments. In order to learn from their caregivers' speech in these settings, they must first separate it from background noise such as that from television shows and siblings. Previous work has found that 7.5-month-old infants can separate streams of speech when the target voice is more intense than the distractor voice (Newman and Jusczyk, 1996), when the target voice is known to the infant (Barker and Newman, 2000) or when infants are presented with an audiovisual (rather than auditory-only) signal (Hollich, Jusczyk, and Newman, 2001). Unfortunately, the paradigm in these studies can only be used on infants at least 7.5 months of age, limiting the ability to investigate how stream segregation develops over time. The present work uses a new paradigm to explore younger infants' ability to separate streams of speech. Infants aged 4.5 months heard a female talker repeat either their own name or another infants' name, while several other voices spoke fluently in the background. We present data on infants' ability to recognize their own name in this cocktail party situation. [Work supported by NSF and NICHD.
A survey of the state-of-the-art and focused research in range systems, task 1

NASA Technical Reports Server (NTRS)

Omura, J. K.

1986-01-01

This final report presents the latest research activity in voice compression. We have designed a non-real time simulation system that is implemented around the IBM-PC where the IBM-PC is used as a speech work station for data acquisition and analysis of voice samples. A real-time implementation is also proposed. This real-time Voice Compression Board (VCB) is built around the Texas Instruments TMS-3220. The voice compression algorithm investigated here was described in an earlier report titled, Low Cost Voice Compression for Mobile Digital Radios, by the author. We will assume the reader is familiar with the voice compression algorithm discussed in this report. The VCB compresses speech waveforms at data rates ranging from 4.8 K bps to 16 K bps. This board interfaces to the IBM-PC 8-bit bus, and plugs into a single expansion slot on the mother board.
Network Speech Systems Technology Program

NASA Astrophysics Data System (ADS)

Weinstein, C. J.

1980-09-01

This report documents work performed during FY 1980 on the DCA-sponsored Network Speech Systems Technology Program. The areas of work reported are: (1) communication systems studies in Demand-Assignment Multiple Access (DAMA), voice/data integration, and adaptive routing, in support of the evolving Defense Communications System (DCS) and Defense Switched Network (DSN); (2) a satellite/terrestrial integration design study including the functional design of voice and data interfaces to interconnect terrestrial and satellite network subsystems; and (3) voice-conferencing efforts dealing with support of the Secure Voice and Graphics Conferencing (SVGC) Test and Evaluation Program. Progress in definition and planning of experiments for the Experimental Integrated Switched Network (EISN) is detailed separately in an FY 80 Experiment Plan Supplement.
A ''Voice Inversion Effect?''

ERIC Educational Resources Information Center

Bedard, Catherine; Belin, Pascal

2004-01-01

Voice is the carrier of speech but is also an ''auditory face'' rich in information on the speaker's identity and affective state. Three experiments explored the possibility of a ''voice inversion effect,'' by analogy to the classical ''face inversion effect,'' which could support the hypothesis of a voice-specific module. Experiment 1 consisted…
A Resource Manual for Speech and Hearing Programs in Oklahoma.

ERIC Educational Resources Information Center

Oklahoma State Dept. of Education, Oklahoma City.

Administrative aspects of the Oklahoma speech and hearing program are described, including state requirements, school administrator role, and organizational and operational procedures. Information on speech and language development and remediation covers language, articulation, stuttering, voice disorders, cleft palate, speech improvement,…
How do voice restoration methods affect the psychological status of patients after total laryngectomy?

PubMed

Saltürk, Z; Arslanoğlu, A; Özdemir, E; Yıldırım, G; Aydoğdu, İ; Kumral, T L; Berkiten, G; Atar, Y; Uyar, Y

2016-03-01

This study investigated the relationship between psychological well-being and different voice rehabilitation methods in total laryngectomy patients. The study enrolled 96 patients who underwent total laryngectomy. The patients were divided into three groups according to the voice rehabilitation method used: esophageal speech (24 patients); a tracheoesophageal fistula and Provox 2 voice prosthesis (57 patients); or an electrolarynx (15 patients). The participants were asked to complete the Turkish version of the Voice Handicap Index-10 (VHI-10) to assess voice problems. They were also asked to complete the Turkish version of the Perceived Stress Scale (PSS), and the Hospital Anxiety and Depression Scale (HADS). The test scores of the three groups were compared statistically. Patients who used esophageal speech had a mean VHI-10 score of 10.25 ± 3.22 versus 19.42 ± 5.56 and 17.60 ± 1.92 for the tracheoesophageal fistula and Provox 2 and electrolarynx groups respectively, reflecting better perception of their voice. They also had a PSS score of 11.38 ± 3.92, indicating that they felt less stressed in comparison with the tracheoesophageal fistula and Provox 2 and electrolarynx groups, which scored 18.84 ± 5.50 and 16.20 ± 3.49 respectively. The HADS scores of the groups were not different, indicating that the patients' anxiety and depression status did not vary. Patients who used esophageal speech perceived less stress and were less handicapped by their voice.
The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users.

PubMed

Fu, Qian-Jie; Chinchilla, Sherol; Galvin, John J

2004-09-01

The present study investigated the relative importance of temporal and spectral cues in voice gender discrimination and vowel recognition by normal-hearing subjects listening to an acoustic simulation of cochlear implant speech processing and by cochlear implant users. In the simulation, the number of speech processing channels ranged from 4 to 32, thereby varying the spectral resolution; the cutoff frequencies of the channels' envelope filters ranged from 20 to 320 Hz, thereby manipulating the available temporal cues. For normal-hearing subjects, results showed that both voice gender discrimination and vowel recognition scores improved as the number of spectral channels was increased. When only 4 spectral channels were available, voice gender discrimination significantly improved as the envelope filter cutoff frequency was increased from 20 to 320 Hz. For all spectral conditions, increasing the amount of temporal information had no significant effect on vowel recognition. Both voice gender discrimination and vowel recognition scores were highly variable among implant users. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to comparable speech processing (4-8 spectral channels). The results suggest that both spectral and temporal cues contribute to voice gender discrimination and that temporal cues are especially important for cochlear implant users to identify the voice gender when there is reduced spectral resolution.
Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction.

PubMed

Nass, C; Lee, K M

2001-09-01

Would people exhibit similarity-attraction and consistency-attraction toward unambiguously computer-generated speech even when personality is clearly not relevant? In Experiment 1, participants (extrovert or introvert) heard a synthesized voice (extrovert or introvert) on a book-buying Web site. Participants accurately recognized personality cues in text to speech and showed similarity-attraction in their evaluation of the computer voice, the book reviews, and the reviewer. Experiment 2, in a Web auction context, added personality of the text to the previous design. The results replicated Experiment 1 and demonstrated consistency (voice and text personality)-attraction. To maximize liking and trust, designers should set parameters, for example, words per minute or frequency range, that create a personality that is consistent with the user and the content being presented.
Noise Robust Speech Recognition Applied to Voice-Driven Wheelchair

NASA Astrophysics Data System (ADS)

Sasou, Akira; Kojima, Hiroaki

2009-12-01

Conventional voice-driven wheelchairs usually employ headset microphones that are capable of achieving sufficient recognition accuracy, even in the presence of surrounding noise. However, such interfaces require users to wear sensors such as a headset microphone, which can be an impediment, especially for the hand disabled. Conversely, it is also well known that the speech recognition accuracy drastically degrades when the microphone is placed far from the user. In this paper, we develop a noise robust speech recognition system for a voice-driven wheelchair. This system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors. We verified the effectiveness of our system in experiments in different environments, and confirmed that our system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors.
Measuring the rate of change of voice fundamental frequency in fluent speech during mental depression.

PubMed

Nilsonne, A; Sundberg, J; Ternström, S; Askenfelt, A

1988-02-01

A method of measuring the rate of change of fundamental frequency has been developed in an effort to find acoustic voice parameters that could be useful in psychiatric research. A minicomputer program was used to extract seven parameters from the fundamental frequency contour of tape-recorded speech samples: (1) the average rate of change of the fundamental frequency and (2) its standard deviation, (3) the absolute rate of fundamental frequency change, (4) the total reading time, (5) the percent pause time of the total reading time, (6) the mean, and (7) the standard deviation of the fundamental frequency distribution. The method is demonstrated on (a) a material consisting of synthetic speech and (b) voice recordings of depressed patients who were examined during depression and after improvement.
Provision of surgical voice restoration in England: questionnaire survey of speech and language therapists.

PubMed

Bradley, P J; Counter, P; Hurren, A; Cocks, H C

2013-08-01

To conduct a questionnaire survey of speech and language therapists providing and managing surgical voice restoration in England. National Health Service Trusts registering more than 10 new laryngeal cancer patients during any one year, from November 2009 to October 2010, were identified, and a list of speech and language therapists compiled. A questionnaire was developed, peer reviewed and revised. The final questionnaire was e-mailed with a covering letter to 82 units. Eighty-two questionnaires were distributed and 72 were returned and analysed, giving a response rate of 87.8 per cent. Forty-four per cent (38/59) of the units performed more than 10 laryngectomies per year. An in-hours surgical voice restoration service was provided by speech and language therapists in 45.8 per cent (33/72) and assisted by nurses in 34.7 per cent (25/72). An out of hours service was provided directly by ENT staff in 35.5 per cent (21/59). Eighty-eight per cent (63/72) of units reported less than 10 (emergency) out of hours calls per month. Surgical voice restoration service provision varies within and between cancer networks. There is a need for a national management and care protocol, an educational programme for out of hours service providers, and a review of current speech and language therapist staffing levels in England.
Correlation of VHI-10 to voice laboratory measurements across five common voice disorders.

PubMed

Gillespie, Amanda I; Gooding, William; Rosen, Clark; Gartner-Schmidt, Jackie

2014-07-01

To correlate change in Voice Handicap Index (VHI)-10 scores with corresponding voice laboratory measures across five voice disorders. Retrospective study. One hundred fifty patients aged >18 years with primary diagnosis of vocal fold lesions, primary muscle tension dysphonia-1, atrophy, unilateral vocal fold paralysis (UVFP), and scar. For each group, participants with the largest change in VHI-10 between two periods (TA and TB) were selected. The dates of the VHI-10 values were linked to corresponding acoustic/aerodynamic and audio-perceptual measures. Change in voice laboratory values were analyzed for correlation with each other and with VHI-10. VHI-10 scores were greater for patients with UVFP than other disorders. The only disorder-specific correlation between voice laboratory measure and VHI-10 was average phonatory airflow in speech for patients with UVFP. Average airflow in repeated phonemes was strongly correlated with average airflow in speech (r=0.75). Acoustic measures did not significantly change between time points. The lack of correlations between the VHI-10 change scores and voice laboratory measures may be due to differing constructs of each measure; namely, handicap versus physiological function. Presuming corroboration between these measures may be faulty. Average airflow in speech may be the most ecologically valid measure for patients with UVFP. Although aerodynamic measures changed between the time points, acoustic measures did not. Correlations to VHI-10 and change between time points may be found with other acoustic measures. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Emotional self-other voice processing in schizophrenia and its relationship with hallucinations: ERP evidence.

PubMed

Pinheiro, Ana P; Rezaii, Neguine; Rauber, Andréia; Nestor, Paul G; Spencer, Kevin M; Niznikiewicz, Margaret

2017-09-01

Abnormalities in self-other voice processing have been observed in schizophrenia, and may underlie the experience of hallucinations. More recent studies demonstrated that these impairments are enhanced for speech stimuli with negative content. Nonetheless, few studies probed the temporal dynamics of self versus nonself speech processing in schizophrenia and, particularly, the impact of semantic valence on self-other voice discrimination. In the current study, we examined these questions, and additionally probed whether impairments in these processes are associated with the experience of hallucinations. Fifteen schizophrenia patients and 16 healthy controls listened to 420 prerecorded adjectives differing in voice identity (self-generated [SGS] versus nonself speech [NSS]) and semantic valence (neutral, positive, and negative), while EEG data were recorded. The N1, P2, and late positive potential (LPP) ERP components were analyzed. ERP results revealed group differences in the interaction between voice identity and valence in the P2 and LPP components. Specifically, LPP amplitude was reduced in patients compared with healthy subjects for SGS and NSS with negative content. Further, auditory hallucinations severity was significantly predicted by LPP amplitude: the higher the SAPS "voices conversing" score, the larger the difference in LPP amplitude between negative and positive NSS. The absence of group differences in the N1 suggests that self-other voice processing abnormalities in schizophrenia are not primarily driven by disrupted sensory processing of voice acoustic information. The association between LPP amplitude and hallucination severity suggests that auditory hallucinations are associated with enhanced sustained attention to negative cues conveyed by a nonself voice. © 2017 Society for Psychophysiological Research.

Relationship between perceived politeness and spectral characteristics of voice

NASA Astrophysics Data System (ADS)

Ito, Mika

2005-04-01

This study investigates the role of voice quality in perceiving politeness under conditions of varying relative social status among Japanese male speakers. The work focuses on four important methodological issues: experimental control of sociolinguistic aspects, eliciting natural spontaneous speech, obtaining recording quality suitable for voice quality analysis, and assessment of glottal characteristics through the use of non-invasive direct measurements of the speech spectrum. To obtain natural, unscripted utterances, the speech data were collected with a Map Task. This methodology allowed us to study the effect of manipulating relative social status among participants in the same community. We then computed the relative amplitudes of harmonics and formant peaks in spectra obtained from the Map Task recordings. Finally, an experiment was conducted to observe the alignment between acoustic measures and the perceived politeness of the voice samples. The results suggest that listeners' perceptions of politeness are determined by spectral characteristics of speakers, in particular, spectral tilts obtained by computing the difference in amplitude between the first harmonic and the third formant.
Central nervous system control of the laryngeal muscles in humans

PubMed Central

Ludlow, Christy L.

2005-01-01

Laryngeal muscle control may vary for different functions such as: voice for speech communication, emotional expression during laughter and cry, breathing, swallowing, and cough. This review discusses the control of the human laryngeal muscles for some of these different functions. Sensori-motor aspects of laryngeal control have been studied by eliciting various laryngeal reflexes. The role of audition in learning and monitoring ongoing voice production for speech is well known; while the role of somatosensory feedback is less well understood. Reflexive control systems involving central pattern generators may contribute to swallowing, breathing and cough with greater cortical control during volitional tasks such as voice production for speech. Volitional control is much less well understood for each of these functions and likely involves the integration of cortical and subcortical circuits. The new frontier is the study of the central control of the laryngeal musculature for voice, swallowing and breathing and how volitional and reflexive control systems may interact in humans. PMID:15927543
Can we perceptually rate alaryngeal voice? Developing the Sunderland Tracheoesophageal Voice Perceptual Scale.

PubMed

Hurren, A; Hildreth, A J; Carding, P N

2009-12-01

To investigate the inter and intra reliability of raters (in relation to both profession and expertise) when judging two alaryngeal voice parameters: 'Overall Grade' and 'Neoglottal Tonicity'. Reliable perceptual assessment is essential for surgical and therapeutic outcome measurement but has been minimally researched to date. Test of inter and intra rater agreement from audio recordings of 55 tracheoesophageal speakers. Cancer Unit. Twelve speech and language therapists and ten Ear, Nose and Throat surgeons. Perceptual voice parameters of 'Overall Grade' rated with a 0-3 equally appearing interval scale and 'Neoglottal Tonicity' with an 11-point bipolar semantic scale. All raters achieved 'good' agreement for 'Overall Grade' with mean weighted kappa coefficients of 0.78 for intra and 0.70 for inter-rater agreement. All raters achieved 'good' intra-rater agreement for 'Neoglottal Tonicity' (0.64) but inter-rater agreement was only 'moderate' (0.40). However, the expert speech and language therapists sub-group attained 'good' inter-rater agreement with this parameter (0.63). The effect of 'Neoglottal Tonicity' on 'Overall Grade' was examined utilising only expert speech and language therapists data. Linear regression analysis resulted in an r-squared coefficient of 0.67. Analysis of the perceptual impression of hypotonicity and hypertonicity in relation to mean 'Overall Grade' score demonstrated neither tone was linked to a more favourable grade (P = 0.42). Expert speech and language therapist raters may be the optimal judges for tracheoesophageal voice assessment. Tonicity appears to be a good predictor of 'Overall Grade'. These scales have clinical applicability to investigate techniques that facilitate optotonic neoglottal voice quality.
The effect of group music therapy on mood, speech, and singing in individuals with Parkinson's disease--a feasibility study.

PubMed

Elefant, Cochavit; Baker, Felicity A; Lotan, Meir; Lagesen, Simen Krogstie; Skeie, Geir Olve

2012-01-01

Parkinson's disease (PD) is a progressive neurodegenerative disorder where patients exhibit impairments in speech production. Few studies have investigated the influence of music interventions on vocal abilities of individuals with PD. To evaluate the influence of a group voice and singing intervention on speech, singing, and depressive symptoms in individuals with PD. Ten patients diagnosed with PD participated in this one-group, repeated measures design study. Participants received the sixty-minute intervention, in a small group setting once a week for 20 consecutive weeks. Speech and singing quality were acoustically analyzed using a KayPentax Multi-Dimensional Voice Program, voice ability using the Voice Handicap Index (VHI), and depressive symptoms using the Montgomery and Asberg Depression rating scale (MADRS). Measures were taken at baseline (Time 1), after 10 weeks of weekly sessions (Time 2), and after 20 weeks of weekly sessions (Time 3). Significant changes were observed for five of the six singing quality outcomes at Time 2 and 3, as well as voice range and the VHI physical subscale at Time 3. No significant changes were found for speaking quality or depressive symptom outcomes; however, there was an absence of decline on speaking quality outcomes over the intervention period. Significant improvements in singing quality and voice range, coupled with the absence of decline in speaking quality support group singing as a promising intervention for persons with PD. A two-group randomized control study is needed to determine whether the intervention contributes to maintenance of speaking quality in persons with PD.
Effects of syllable-initial voicing and speaking rate on the temporal characteristics of monosyllabic words.

PubMed

Allen, J S; Miller, J L

1999-10-01

Two speech production experiments tested the validity of the traditional method of creating voice-onset-time (VOT) continua for perceptual studies in which the systematic increase in VOT across the continuum is accompanied by a concomitant decrease in the duration of the following vowel. In experiment 1, segmental durations were measured for matched monosyllabic words beginning with either a voiced stop (e.g., big, duck, gap) or a voiceless stop (e.g., pig, tuck, cap). Results from four talkers showed that the change from voiced to voiceless stop produced not only an increase in VOT, but also a decrease in vowel duration. However, the decrease in vowel duration was consistently less than the increase in VOT. In experiment 2, results from four new talkers replicated these findings at two rates of speech, as well as highlighted the contrasting temporal effects on vowel duration of an increase in VOT due to a change in syllable-initial voicing versus a change in speaking rate. It was concluded that the traditional method of creating VOT continua for perceptual experiments, although not perfect, approximates natural speech by capturing the basic trade-off between VOT and vowel duration in syllable-initial voiced versus voiceless stop consonants.
External Validation of the Acoustic Voice Quality Index Version 03.01 With Extended Representativity.

PubMed

Barsties, Ben; Maryn, Youri

2016-07-01

The Acoustic Voice Quality Index (AVQI) is an objective method to quantify the severity of overall voice quality in concatenated continuous speech and sustained phonation segments. Recently, AVQI was successfully modified to be more representative and ecologically valid because the internal consistency of AVQI was balanced out through equal proportion of the 2 speech types. The present investigation aims to explore its external validation in a large data set. An expert panel of 12 speech-language therapists rated the voice quality of 1058 concatenated voice samples varying from normophonia to severe dysphonia. The Spearman rank-order correlation coefficients (r) were used to measure concurrent validity. The AVQI's diagnostic accuracy was evaluated with several estimates of its receiver operating characteristics (ROC). Finally, 8 of the 12 experts were chosen because of reliability criteria. A strong correlation was identified between AVQI and auditoryperceptual rating (r = 0.815, P = .000). It indicated that 66.4% of the auditory-perceptual rating's variation was explained by AVQI. Additionally, the ROC results showed again the best diagnostic outcome at a threshold of AVQI = 2.43. This study highlights external validation and diagnostic precision of the AVQI version 03.01 as a robust and ecologically valid measurement to objectify voice quality. © The Author(s) 2016.
Cepstral analysis of normal and pathological voice in Spanish adults. Smoothed cepstral peak prominence in sustained vowels versus connected speech.

PubMed

Delgado-Hernández, Jonathan; León-Gómez, Nieves M; Izquierdo-Arteaga, Laura M; Llanos-Fumero, Yanira

In recent years, the use of cepstral measures for acoustic evaluation of voice has increased. One of the most investigated parameters is smoothed cepstral peak prominence (CPPs). The objectives of this paper are to establish the usefulness of this acoustic measure in the objective evaluation of alterations of the voice in Spanish and to determine what type of voice sample (sustained vowels or connected speech) is the most sensitive in evaluating the severity of dysphonia. Forty subjects participated in this study 40, 20 controls and 20 with dysphonia. Two voice samples were recorded for each subject (one sustained vowel/a/and four phonetically balanced sentences) and the CPPs was calculated using the Praat programme. Three raters perceptually evaluated the voice sample with the Grade parameter of GRABS scale. Significantly lower values were found in the dysphonic voices, both for/a/(t [38] = 4.85, P<.000) and for phrases (t [38] = 5,75, P<.000). In relation to the type of voice sample most suitable for evaluating the severity of voice alterations, a strong correlation was found with the acoustic-perceptual scale of CPPs calculated from connected speech (r s = -0.73) and moderate correlation with that calculated from the sustained vowel (r s = -0,56). The results of this preliminary study suggest that CPPs is a good measure to detect dysphonia and to objectively assess the severity of alterations in the voice. Copyright © 2017 Elsevier España, S.L.U. and Sociedad Española de Otorrinolaringología y Cirugía de Cabeza y Cuello. All rights reserved.
Design of an efficient music-speech discriminator.

PubMed

Tardón, Lorenzo J; Sammartino, Simone; Barbancho, Isabel

2010-01-01

In this paper, the problem of the design of a simple and efficient music-speech discriminator for large audio data sets in which advanced music playing techniques are taught and voice and music are intrinsically interleaved is addressed. In the process, a number of features used in speech-music discrimination are defined and evaluated over the available data set. Specifically, the data set contains pieces of classical music played with different and unspecified instruments (or even lyrics) and the voice of a teacher (a top music performer) or even the overlapped voice of the translator and other persons. After an initial test of the performance of the features implemented, a selection process is started, which takes into account the type of classifier selected beforehand, to achieve good discrimination performance and computational efficiency, as shown in the experiments. The discrimination application has been defined and tested on a large data set supplied by Fundacion Albeniz, containing a large variety of classical music pieces played with different instrument, which include comments and speeches of famous performers.
Exploring the anatomical encoding of voice with a mathematical model of the vocal system.

PubMed

Assaneo, M Florencia; Sitt, Jacobo; Varoquaux, Gael; Sigman, Mariano; Cohen, Laurent; Trevisan, Marcos A

2016-11-01

The faculty of language depends on the interplay between the production and perception of speech sounds. A relevant open question is whether the dimensions that organize voice perception in the brain are acoustical or depend on properties of the vocal system that produced it. One of the main empirical difficulties in answering this question is to generate sounds that vary along a continuum according to the anatomical properties the vocal apparatus that produced them. Here we use a mathematical model that offers the unique possibility of synthesizing vocal sounds by controlling a small set of anatomically based parameters. In a first stage the quality of the synthetic voice was evaluated. Using specific time traces for sub-glottal pressure and tension of the vocal folds, the synthetic voices generated perceptual responses, which are indistinguishable from those of real speech. The synthesizer was then used to investigate how the auditory cortex responds to the perception of voice depending on the anatomy of the vocal apparatus. Our fMRI results show that sounds are perceived as human vocalizations when produced by a vocal system that follows a simple relationship between the size of the vocal folds and the vocal tract. We found that these anatomical parameters encode the perceptual vocal identity (male, female, child) and show that the brain areas that respond to human speech also encode vocal identity. On the basis of these results, we propose that this low-dimensional model of the vocal system is capable of generating realistic voices and represents a novel tool to explore the voice perception with a precise control of the anatomical variables that generate speech. Furthermore, the model provides an explanation of how auditory cortices encode voices in terms of the anatomical parameters of the vocal system. Copyright © 2016 Elsevier Inc. All rights reserved.
Measurement of Voice Onset Time in Maxillectomy Patients

PubMed Central

Hattori, Mariko; Sumita, Yuka I.; Taniguchi, Hisashi

2014-01-01

Objective speech evaluation using acoustic measurement is needed for the proper rehabilitation of maxillectomy patients. For digital evaluation of consonants, measurement of voice onset time is one option. However, voice onset time has not been measured in maxillectomy patients as their consonant sound spectra exhibit unique characteristics that make the measurement of voice onset time challenging. In this study, we established criteria for measuring voice onset time in maxillectomy patients for objective speech evaluation. We examined voice onset time for /ka/ and /ta/ in 13 maxillectomy patients by calculating the number of valid measurements of voice onset time out of three trials for each syllable. Wilcoxon's signed rank test showed that voice onset time measurements were more successful for /ka/ and /ta/ when a prosthesis was used (Z = −2.232, P = 0.026 and Z = −2.401, P = 0.016, resp.) than when a prosthesis was not used. These results indicate a prosthesis affected voice onset measurement in these patients. Although more research in this area is needed, measurement of voice onset time has the potential to be used to evaluate consonant production in maxillectomy patients wearing a prosthesis. PMID:24574934
Evaluation of the comprehension of noncontinuous sped-up vocoded speech - A strategy for coping with fading HF channels

NASA Astrophysics Data System (ADS)

Lynch, John T.

1987-02-01

The present technique for coping with fading and burst noise on HF channels used in digital voice communications transmits digital voice only during high S/N time intervals, and speeds up the speech when necessary to avoid conversation-hindering delays. On the basis of informal listening tests, four test conditions were selected in order to characterize those conditions of speech interruption which would render it comprehensible or incomprehensible. One of the test conditions, 2 secs on and 1/2-sec off, yielded test scores comparable to the reference continuous speech case and is a reasonable match to the temporal variations of a disturbed ionosphere.
Verbal Paradata and Survey Error: Respondent Speech, Voice, and Question-Answering Behavior Can Predict Income Item Nonresponse

ERIC Educational Resources Information Center

Jans, Matthew E.

2010-01-01

Income nonresponse is a significant problem in survey data, with rates as high as 50%, yet we know little about why it occurs. It is plausible that the way respondents answer survey questions (e.g., their voice and speech characteristics, and their question- answering behavior) can predict whether they will provide income data, and will reflect…
Speech, Prosody, and Voice Characteristics of a Mother and Daughter with a 7;13 Translocation Affecting "FOXP2"

ERIC Educational Resources Information Center

Shriberg, Lawrence D.; Ballard, Kirrie J.; Tomblin, J. Bruce; Duffy, Joseph R.; Odell, Katharine H.; Williams, Charles A.

2006-01-01

Purpose: The primary goal of this case study was to describe the speech, prosody, and voice characteristics of a mother and daughter with a breakpoint in a balanced 7;13 chromosomal translocation that disrupted the transcription gene, "FOXP2" (cf. J. B. Tomblin et al., 2005). As with affected members of the widely cited KE family, whose…
Mistaking minds and machines: How speech affects dehumanization and anthropomorphism.

PubMed

Schroeder, Juliana; Epley, Nicholas

2016-11-01

Treating a human mind like a machine is an essential component of dehumanization, whereas attributing a humanlike mind to a machine is an essential component of anthropomorphism. Here we tested how a cue closely connected to a person's actual mental experience-a humanlike voice-affects the likelihood of mistaking a person for a machine, or a machine for a person. We predicted that paralinguistic cues in speech are particularly likely to convey the presence of a humanlike mind, such that removing voice from communication (leaving only text) would increase the likelihood of mistaking the text's creator for a machine. Conversely, adding voice to a computer-generated script (resulting in speech) would increase the likelihood of mistaking the text's creator for a human. Four experiments confirmed these hypotheses, demonstrating that people are more likely to infer a human (vs. computer) creator when they hear a voice expressing thoughts than when they read the same thoughts in text. Adding human visual cues to text (i.e., seeing a person perform a script in a subtitled video clip), did not increase the likelihood of inferring a human creator compared with only reading text, suggesting that defining features of personhood may be conveyed more clearly in speech (Experiments 1 and 2). Removing the naturalistic paralinguistic cues that convey humanlike capacity for thinking and feeling, such as varied pace and intonation, eliminates the humanizing effect of speech (Experiment 4). We discuss implications for dehumanizing others through text-based media, and for anthropomorphizing machines through speech-based media. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Auditory-Perceptual and Acoustic Methods in Measuring Dysphonia Severity of Korean Speech.

PubMed

Maryn, Youri; Kim, Hyung-Tae; Kim, Jaeock

2016-09-01

The purpose of this study was to explore the criterion-related concurrent validity of two standardized auditory-perceptual rating protocols and the Acoustic Voice Quality Index (AVQI) for measuring dysphonia severity in Korean speech. Sixty native Korean subjects with various voice disorders were asked to sustain the vowel [a:] and to read aloud the Korean text "Walk." A 3-second midvowel portion of the sustained vowel and two sentences (with 25 syllables) were edited, concatenated, and analyzed according to methods described elsewhere. From 56 participants, both continuous speech and sustained vowel recordings had sufficiently high signal-to-noise ratios (35.5 dB and 37 dB on average, respectively) and were therefore subjected to further dysphonia severity analysis with (1) "G" or Grade from the GRBAS protocol, (2) "OS" or Overall Severity from the Consensus Auditory-Perceptual Evaluation of Voice protocol, and (3) AVQI. First, high correlations were found between G and OS (rS = 0.955 for sustained vowels; rS = 0.965 for continuous speech). Second, the AVQI showed a strong correlation with G (rS = 0.911) as well as OS (rP = 0.924). These findings are in agreement with similar studies dealing with continuous speech in other languages. The present study highlights the criterion-related concurrent validity of these methods in Korean speech. Furthermore, it supports the cross-linguistic robustness of the AVQI as a valid and objective marker of overall dysphonia severity. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Acoustic analysis of speech variables during depression and after improvement.

PubMed

Nilsonne, A

1987-09-01

Speech recordings were made of 16 depressed patients during depression and after clinical improvement. The recordings were analyzed using a computer program which extracts acoustic parameters from the fundamental frequency contour of the voice. The percent pause time, the standard deviation of the voice fundamental frequency distribution, the standard deviation of the rate of change of the voice fundamental frequency and the average speed of voice change were found to correlate to the clinical state of the patient. The mean fundamental frequency, the total reading time and the average rate of change of the voice fundamental frequency did not differ between the depressed and the improved group. The acoustic measures were more strongly correlated to the clinical state of the patient as measured by global depression scores than to single depressive symptoms such as retardation or agitation.
Voice gender identification by cochlear implant users: The role of spectral and temporal resolution

NASA Astrophysics Data System (ADS)

Fu, Qian-Jie; Chinchilla, Sherol; Nogaki, Geraldine; Galvin, John J.

2005-09-01

The present study explored the relative contributions of spectral and temporal information to voice gender identification by cochlear implant users and normal-hearing subjects. Cochlear implant listeners were tested using their everyday speech processors, while normal-hearing subjects were tested under speech processing conditions that simulated various degrees of spectral resolution, temporal resolution, and spectral mismatch. Voice gender identification was tested for two talker sets. In Talker Set 1, the mean fundamental frequency values of the male and female talkers differed by 100 Hz while in Talker Set 2, the mean values differed by 10 Hz. Cochlear implant listeners achieved higher levels of performance with Talker Set 1, while performance was significantly reduced for Talker Set 2. For normal-hearing listeners, performance was significantly affected by the spectral resolution, for both Talker Sets. With matched speech, temporal cues contributed to voice gender identification only for Talker Set 1 while spectral mismatch significantly reduced performance for both Talker Sets. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to 4-8 spectral channels. The results suggest that, because of the reduced spectral resolution, cochlear implant patients may attend strongly to periodicity cues to distinguish voice gender.
The value of visualizing tone of voice.

PubMed

Pullin, Graham; Cook, Andrew

2013-10-01

Whilst most of us have an innate feeling for tone of voice, it is an elusive quality that even phoneticians struggle to describe with sufficient subtlety. For people who cannot speak themselves this can have particularly profound repercussions. Augmentative communication often involves text-to-speech, a technology that only supports a basic choice of prosody based on punctuation. Given how inherently difficult it is to talk about more nuanced tone of voice, there is a risk that its absence from current devices goes unremarked and unchallenged. Looking ahead optimistically to more expressive communication aids, their design will need to involve more subtle interactions with tone of voice-interactions that the people using them can understand and engage with. Interaction design can play a role in making tone of voice visible, tangible, and accessible. Two projects that have already catalysed interdisciplinary debate in this area, Six Speaking Chairs and Speech Hedge, are introduced together with responses. A broader role for design is advocated, as a means to opening up speech technology research to a wider range of disciplinary perspectives, and also to the contributions and influence of people who use it in their everyday lives.
Differential neural contributions to native- and foreign-language talker identification

PubMed Central

Perrachione, Tyler K.; Pierrehumbert, Janet B.; Wong, Patrick C.M.

2009-01-01

Humans are remarkably adept at identifying individuals by the sound of their voice, a behavior supported by the nervous system’s ability to integrate information from voice and speech perception. Talker-identification abilities are significantly impaired when listeners are unfamiliar with the language being spoken. Recent behavioral studies describing the language-familiarity effect implicate functionally integrated neural systems for speech and voice perception, yet specific neuroscientific evidence demonstrating the basis for such integration has not yet been shown. Listeners in the present study learned to identify voices speaking a familiar (native) or unfamiliar (foreign) language. The talker-identification performance of neural circuitry in each cerebral hemisphere was assessed using dichotic listening. To determine the relative contribution of circuitry in each hemisphere to ecological (binaural) talker identification abilities, we compared the predictive capacity of dichotic performance on binaural performance across languages. We found listeners’ right-ear (left hemisphere) performance to be a better predictor of overall accuracy in their native language than a foreign one. The enhanced predictive capacity of the classically language-dominant left-hemisphere on overall talker-identification accuracy demonstrates functionally integrated neural systems for speech and voice perception during natural talker identification. PMID:19968445
Crossmodal interactions during non-linguistic auditory processing in cochlear-implanted deaf patients.

PubMed

Barone, Pascal; Chambaudie, Laure; Strelnikov, Kuzma; Fraysse, Bernard; Marx, Mathieu; Belin, Pascal; Deguine, Olivier

2016-10-01

Due to signal distortion, speech comprehension in cochlear-implanted (CI) patients relies strongly on visual information, a compensatory strategy supported by important cortical crossmodal reorganisations. Though crossmodal interactions are evident for speech processing, it is unclear whether a visual influence is observed in CI patients during non-linguistic visual-auditory processing, such as face-voice interactions, which are important in social communication. We analyse and compare visual-auditory interactions in CI patients and normal-hearing subjects (NHS) at equivalent auditory performance levels. Proficient CI patients and NHS performed a voice-gender categorisation in the visual-auditory modality from a morphing-generated voice continuum between male and female speakers, while ignoring the presentation of a male or female visual face. Our data show that during the face-voice interaction, CI deaf patients are strongly influenced by visual information when performing an auditory gender categorisation task, in spite of maximum recovery of auditory speech. No such effect is observed in NHS, even in situations of CI simulation. Our hypothesis is that the functional crossmodal reorganisation that occurs in deafness could influence nonverbal processing, such as face-voice interaction; this is important for patient internal supramodal representation. Copyright © 2016 Elsevier Ltd. All rights reserved.

The effect of deep brain stimulation on the speech motor system.

PubMed

Mücke, Doris; Becker, Johannes; Barbe, Michael T; Meister, Ingo; Liebhart, Lena; Roettger, Timo B; Dembek, Till; Timmermann, Lars; Grice, Martine

2014-08-01

Chronic deep brain stimulation of the nucleus ventralis intermedius is an effective treatment for individuals with medication-resistant essential tremor. However, these individuals report that stimulation has a deleterious effect on their speech. The present study investigates one important factor leading to these effects: the coordination of oral and glottal articulation. Sixteen native-speaking German adults with essential tremor, between 26 and 86 years old, with and without chronic deep brain stimulation of the nucleus ventralis intermedius and 12 healthy, age-matched subjects were recorded performing a fast syllable repetition task (/papapa/, /tatata/, /kakaka/). Syllable duration and voicing-to-syllable ratio as well as parameters related directly to consonant production, voicing during constriction, and frication during constriction were measured. Voicing during constriction was greater in subjects with essential tremor than in controls, indicating a perseveration of voicing into the voiceless consonant. Stimulation led to fewer voiceless intervals (voicing-to-syllable ratio), indicating a reduced degree of glottal abduction during the entire syllable cycle. Stimulation also induced incomplete oral closures (frication during constriction), indicating imprecise oral articulation. The detrimental effect of stimulation on the speech motor system can be quantified using acoustic measures at the subsyllabic level.
An automatic speech recognition system with speaker-independent identification support

NASA Astrophysics Data System (ADS)

Caranica, Alexandru; Burileanu, Corneliu

2015-02-01

The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.
Vocal fry may undermine the success of young women in the labor market.

PubMed

Anderson, Rindy C; Klofstad, Casey A; Mayew, William J; Venkatachalam, Mohan

2014-01-01

Vocal fry is speech that is low pitched and creaky sounding, and is increasingly common among young American females. Some argue that vocal fry enhances speaker labor market perceptions while others argue that vocal fry is perceived negatively and can damage job prospects. In a large national sample of American adults we find that vocal fry is interpreted negatively. Relative to a normal speaking voice, young adult female voices exhibiting vocal fry are perceived as less competent, less educated, less trustworthy, less attractive, and less hirable. The negative perceptions of vocal fry are stronger for female voices relative to male voices. These results suggest that young American females should avoid using vocal fry speech in order to maximize labor market opportunities.
Duenna-An experimental language teaching application

NASA Astrophysics Data System (ADS)

Horváth, Balázs Zsigmond; Blaske, Bence; Szabó, Anita

The presented TTS (text-to-speech) application is an auxiliary tool for language teaching. It utilizes computer-generated voices to simulate dialogs representing different grammatical problems or speech contexts. The software is capable of producing as many examples of dialogs as required to enhance the language learning experience and thus serve curriculum representation, grammar contextualization and pronunciation at the same time. It is designed to be used on a regular basis in the language classroom and students gladly write materials for listening comprehension tasks with it. A pilot study involving 26 students (divided into control and trial groups) practicing for their school-leaving exam, indicates that computer-generated voices are adequate to recreate audio course book materials as well. The voices used were able to involve the students as effectively as if they were listening to recorded human speech.
Corollary discharge provides the sensory content of inner speech.

PubMed

Scott, Mark

2013-09-01

Inner speech is one of the most common, but least investigated, mental activities humans perform. It is an internal copy of one's external voice and so is similar to a well-established component of motor control: corollary discharge. Corollary discharge is a prediction of the sound of one's voice generated by the motor system. This prediction is normally used to filter self-caused sounds from perception, which segregates them from externally caused sounds and prevents the sensory confusion that would otherwise result. The similarity between inner speech and corollary discharge motivates the theory, tested here, that corollary discharge provides the sensory content of inner speech. The results reported here show that inner speech attenuates the impact of external sounds. This attenuation was measured using a context effect (an influence of contextual speech sounds on the perception of subsequent speech sounds), which weakens in the presence of speech imagery that matches the context sound. Results from a control experiment demonstrated this weakening in external speech as well. Such sensory attenuation is a hallmark of corollary discharge.
The prevalence of speech disorder in primary school students in Yazd-Iran.

PubMed

Karbasi, Sedighah Akhavan; Fallah, Razieh; Golestan, Motaharah

2011-01-01

Communication disorder is a widespread disabling problems and associated with adverse, long term outcome that impact on individuals, families and academic achievement of children in the school years and affect vocational choices later in adulthood. The aim of this study was to determine prevalence of speech disorders specifically stuttering, voice, and speech-sound disorders in primary school students in Iran-Yazd. In a descriptive study, 7881 primary school students in Yazd evaluated in view from of speech disorders with use of direct and face to face assessment technique in 2005. The prevalence of total speech disorders was 14.8% among whom 13.8% had speech-sound disorder, 1.2% stuttering and 0.47% voice disorder. The prevalence of speech disorders was higher than in males (16.7%) as compared to females (12.7%). Pattern of prevalence of the three speech disorders was significantly different according to gender, parental education and by number of family member. There was no significant difference across speech disorders and birth order, religion and paternal consanguinity. These prevalence figures are higher than more studies that using parent or teacher reports.
Guidelines for Selecting Microphones for Human Voice Production Research

ERIC Educational Resources Information Center

Svec, Jan G.; Granqvist, Svante

2010-01-01

Purpose: This tutorial addresses fundamental characteristics of microphones (frequency response, frequency range, dynamic range, and directionality), which are important for accurate measurements of voice and speech. Method: Technical and voice literature was reviewed and analyzed. The following recommendations on desirable microphone…
Robotics control using isolated word recognition of voice input

NASA Technical Reports Server (NTRS)

Weiner, J. M.

1977-01-01

A speech input/output system is presented that can be used to communicate with a task oriented system. Human speech commands and synthesized voice output extend conventional information exchange capabilities between man and machine by utilizing audio input and output channels. The speech input facility is comprised of a hardware feature extractor and a microprocessor implemented isolated word or phrase recognition system. The recognizer offers a medium sized (100 commands), syntactically constrained vocabulary, and exhibits close to real time performance. The major portion of the recognition processing required is accomplished through software, minimizing the complexity of the hardware feature extractor.
The interlanguage speech intelligibility benefit for native speakers of Mandarin: Production and perception of English word-final voicing contrasts

PubMed Central

Hayes-Harb, Rachel; Smith, Bruce L.; Bent, Tessa; Bradlow, Ann R.

2009-01-01

This study investigated the intelligibility of native and Mandarin-accented English speech for native English and native Mandarin listeners. The word-final voicing contrast was considered (as in minimal pairs such as `cub' and `cup') in a forced-choice word identification task. For these particular talkers and listeners, there was evidence of an interlanguage speech intelligibility benefit for listeners (i.e., native Mandarin listeners were more accurate than native English listeners at identifying Mandarin-accented English words). However, there was no evidence of an interlanguage speech intelligibility benefit for talkers (i.e., native Mandarin listeners did not find Mandarin-accented English speech more intelligible than native English speech). When listener and talker phonological proficiency (operationalized as accentedness) was taken into account, it was found that the interlanguage speech intelligibility benefit for listeners held only for the low phonological proficiency listeners and low phonological proficiency speech. The intelligibility data were also considered in relation to various temporal-acoustic properties of native English and Mandarin-accented English speech in effort to better understand the properties of speech that may contribute to the interlanguage speech intelligibility benefit. PMID:19606271
Military and government applications of human-machine communication by voice.

PubMed Central

Weinstein, C J

1995-01-01

This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier's computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs. Images Fig. 1 Fig. 2 Fig. 3 Fig. 4 Fig. 5 Fig. 6 PMID:7479718
Monkeys and Humans Share a Common Computation for Face/Voice Integration

PubMed Central

Chandrasekaran, Chandramouli; Lemus, Luis; Trubanova, Andrea; Gondan, Matthias; Ghazanfar, Asif A.

2011-01-01

Speech production involves the movement of the mouth and other regions of the face resulting in visual motion cues. These visual cues enhance intelligibility and detection of auditory speech. As such, face-to-face speech is fundamentally a multisensory phenomenon. If speech is fundamentally multisensory, it should be reflected in the evolution of vocal communication: similar behavioral effects should be observed in other primates. Old World monkeys share with humans vocal production biomechanics and communicate face-to-face with vocalizations. It is unknown, however, if they, too, combine faces and voices to enhance their perception of vocalizations. We show that they do: monkeys combine faces and voices in noisy environments to enhance their detection of vocalizations. Their behavior parallels that of humans performing an identical task. We explored what common computational mechanism(s) could explain the pattern of results we observed across species. Standard explanations or models such as the principle of inverse effectiveness and a “race” model failed to account for their behavior patterns. Conversely, a “superposition model”, positing the linear summation of activity patterns in response to visual and auditory components of vocalizations, served as a straightforward but powerful explanatory mechanism for the observed behaviors in both species. As such, it represents a putative homologous mechanism for integrating faces and voices across primates. PMID:21998576
Calibration of Clinical Audio Recording and Analysis Systems for Sound Intensity Measurement.

PubMed

Maryn, Youri; Zarowski, Andrzej

2015-11-01

Sound intensity is an important acoustic feature of voice/speech signals. Yet recordings are performed with different microphone, amplifier, and computer configurations, and it is therefore crucial to calibrate sound intensity measures of clinical audio recording and analysis systems on the basis of output of a sound-level meter. This study was designed to evaluate feasibility, validity, and accuracy of calibration methods, including audiometric speech noise signals and human voice signals under typical speech conditions. Calibration consisted of 3 comparisons between data from 29 measurement microphone-and-computer systems and data from the sound-level meter: signal-specific comparison with audiometric speech noise at 5 levels, signal-specific comparison with natural voice at 3 levels, and cross-signal comparison with natural voice at 3 levels. Intensity measures from recording systems were then linearly converted into calibrated data on the basis of these comparisons, and validity and accuracy of calibrated sound intensity were investigated. Very strong correlations and quasisimilarity were found between calibrated data and sound-level meter data across calibration methods and recording systems. Calibration of clinical sound intensity measures according to this method is feasible, valid, accurate, and representative for a heterogeneous set of microphones and data acquisition systems in real-life circumstances with distinct noise contexts.
Learning to Recognize Speakers of a Non-Native Language: Implications for the Functional Organization of Human Auditory Cortex

ERIC Educational Resources Information Center

Perrachione, Tyler K.; Wong, Patrick C. M.

2007-01-01

Brain imaging studies of voice perception often contrast activation from vocal and verbal tasks to identify regions uniquely involved in processing voice. However, such a strategy precludes detection of the functional relationship between speech and voice perception. In a pair of experiments involving identifying voices from native and foreign…
Objective measurement of motor speech characteristics in the healthy pediatric population.

PubMed

Wong, A W; Allegro, J; Tirado, Y; Chadha, N; Campisi, P

2011-12-01

To obtain objective measurements of motor speech characteristics in normal children, using a computer-based motor speech software program. Cross-sectional, observational design in a university-based ambulatory pediatric otolaryngology clinic. Participants included 112 subjects (54 females and 58 males) aged 4-18 years. Participants with previously diagnosed hearing loss, voice and motor disorders, and children unable to repeat a passage in English were excluded. Voice samples were recorded and analysed using the Motor Speech Profile (MSP) software (KayPENTAX, Lincoln Park, NJ). The MSP produced measures of diadochokinetics, second formant transition, intonation, and syllabic rates. Demographic data, including sex, age, and cigarette smoke exposure were obtained. Normative data for several motor speech characteristics were derived for children ranging from age 4 to 18 years. A number of age-dependent changes were indentified, including an increase in average diadochokinetic rate (p<0.001) and standard syllabic duration (p<0.001) with age. There were no identified differences in motor speech characteristics between males and females across the measured age range. Variations in fundamental frequency (Fo) during speech did not change significantly with age for both males and females. To our knowledge, this is the first pediatric normative database for the MSP progam. The MSP is suitable for testing children and can be used to study developmental changes in motor speech. The analysis demonstrated that males and females behave similarly and show the same relationship with age for the motor speech characteristics studied. This normative database will provide essential comparative data for future studies exploring alterations in motor speech that may occur with hearing, voice, and motor disorders and to assess the results of targeted therapies. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
A recursive linear predictive vocoder

NASA Astrophysics Data System (ADS)

Janssen, W. A.

1983-12-01

A non-real time 10 pole recursive autocorrelation linear predictive coding vocoder was created for use in studying effects of recursive autocorrelation on speech. The vocoder is composed of two interchangeable pitch detectors, a speech analyzer, and speech synthesizer. The time between updating filter coefficients is allowed to vary from .125 msec to 20 msec. The best quality was found using .125 msec between each update. The greatest change in quality was noted when changing from 20 msec/update to 10 msec/update. Pitch period plots for the center clipping autocorrelation pitch detector and simplified inverse filtering technique are provided. Plots of speech into and out of the vocoder are given. Formant versus time three dimensional plots are shown. Effects of noise on pitch detection and formants are shown. Noise effects the voiced/unvoiced decision process causing voiced speech to be re-constructed as unvoiced.
Analog voicing detector responds to pitch

NASA Technical Reports Server (NTRS)

Abel, R. S.; Watkins, H. E.

1967-01-01

Modified electronic voice encoder /Vocoder/ includes an independent analog mode of operation in addition to the conventional digital mode. The Vocoder is a bandwidth compression equipment that permits voice transmission over channels, having only a fraction of the bandwidth required for conventional telephone-quality speech transmission.
Voice Disorders in School Children: Clinical Management.

ERIC Educational Resources Information Center

Garbee, Frederick E., Ed.

Five papers presented at two inservice institutes for school speech and language pathologists delineated identification, remediation, and management of voice disorders in school children. Keynote remarks emphasized the intimate relationship between children's voices and their affective behavior and psychological needs, and thus, the importance of…
Standardization of pitch-range settings in voice acoustic analysis.

PubMed

Vogel, Adam P; Maruff, Paul; Snyder, Peter J; Mundt, James C

2009-05-01

Voice acoustic analysis is typically a labor-intensive, time-consuming process that requires the application of idiosyncratic parameters tailored to individual aspects of the speech signal. Such processes limit the efficiency and utility of voice analysis in clinical practice as well as in applied research and development. In the present study, we analyzed 1,120 voice files, using standard techniques (case-by-case hand analysis), taking roughly 10 work weeks of personnel time to complete. The results were compared with the analytic output of several automated analysis scripts that made use of preset pitch-range parameters. After pitch windows were selected to appropriately account for sex differences, the automated analysis scripts reduced processing time of the 1,120 speech samples to less than 2.5 h and produced results comparable to those obtained with hand analysis. However, caution should be exercised when applying the suggested preset values to pathological voice populations.
Voice Based City Panic Button System

NASA Astrophysics Data System (ADS)

Febriansyah; Zainuddin, Zahir; Bachtiar Nappu, M.

2018-03-01

The development of voice activated panic button application aims to design faster early notification of hazardous condition in community to the nearest police by using speech as the detector where the current application still applies touch-combination on screen and use coordination of orders from control center then the early notification still takes longer time. The method used in this research was by using voice recognition as the user voice detection and haversine formula for the comparison of closest distance between the user and the police. This research was equipped with auto sms, which sent notification to the victim’s relatives, that was also integrated with Google Maps application (GMaps) as the map to the victim’s location. The results show that voice registration on the application reaches 100%, incident detection using speech recognition while the application is running is 94.67% in average, and the auto sms to the victim relatives reaches 100%.
A taste of the north: Voices from the wilderness about the wilderness character of Alaska

Treesearch

Alan E. Watson; Katie Kneeshaw; Brian Glaspell

2004-01-01

These voices from the wilderness were compiled to illustrate some of the values of wilderness in Alaska. Wilderness visitors, non-native Alaska residents, and rural, native people can all have different perceptions of wilderness character, define wilderness differently, go to wilderness for different reasons, see different things when they are there, perceive wildness...

A pneumatic Bionic Voice prosthesis-Pre-clinical trials of controlling the voice onset and offset.

PubMed

Ahmadi, Farzaneh; Noorian, Farzad; Novakovic, Daniel; van Schaik, André

2018-01-01

Despite emergent progress in many fields of bionics, a functional Bionic Voice prosthesis for laryngectomy patients (larynx amputees) has not yet been achieved, leading to a lifetime of vocal disability for these patients. This study introduces a novel framework of Pneumatic Bionic Voice Prostheses as an electronic adaptation of the Pneumatic Artificial Larynx (PAL) device. The PAL is a non-invasive mechanical voice source, driven exclusively by respiration with an exceptionally high voice quality, comparable to the existing gold standard of Tracheoesophageal (TE) voice prosthesis. Following PAL design closely as the reference, Pneumatic Bionic Voice Prostheses seem to have a strong potential to substitute the existing gold standard by generating a similar voice quality while remaining non-invasive and non-surgical. This paper designs the first Pneumatic Bionic Voice prosthesis and evaluates its onset and offset control against the PAL device through pre-clinical trials on one laryngectomy patient. The evaluation on a database of more than five hours of continuous/isolated speech recordings shows a close match between the onset/offset control of the Pneumatic Bionic Voice and the PAL with an accuracy of 98.45 ±0.54%. When implemented in real-time, the Pneumatic Bionic Voice prosthesis controller has an average onset/offset delay of 10 milliseconds compared to the PAL. Hence it addresses a major disadvantage of previous electronic voice prostheses, including myoelectric Bionic Voice, in meeting the short time-frames of controlling the onset/offset of the voice in continuous speech.
A pneumatic Bionic Voice prosthesis—Pre-clinical trials of controlling the voice onset and offset

PubMed Central

Noorian, Farzad; Novakovic, Daniel; van Schaik, André

2018-01-01

Despite emergent progress in many fields of bionics, a functional Bionic Voice prosthesis for laryngectomy patients (larynx amputees) has not yet been achieved, leading to a lifetime of vocal disability for these patients. This study introduces a novel framework of Pneumatic Bionic Voice Prostheses as an electronic adaptation of the Pneumatic Artificial Larynx (PAL) device. The PAL is a non-invasive mechanical voice source, driven exclusively by respiration with an exceptionally high voice quality, comparable to the existing gold standard of Tracheoesophageal (TE) voice prosthesis. Following PAL design closely as the reference, Pneumatic Bionic Voice Prostheses seem to have a strong potential to substitute the existing gold standard by generating a similar voice quality while remaining non-invasive and non-surgical. This paper designs the first Pneumatic Bionic Voice prosthesis and evaluates its onset and offset control against the PAL device through pre-clinical trials on one laryngectomy patient. The evaluation on a database of more than five hours of continuous/isolated speech recordings shows a close match between the onset/offset control of the Pneumatic Bionic Voice and the PAL with an accuracy of 98.45 ±0.54%. When implemented in real-time, the Pneumatic Bionic Voice prosthesis controller has an average onset/offset delay of 10 milliseconds compared to the PAL. Hence it addresses a major disadvantage of previous electronic voice prostheses, including myoelectric Bionic Voice, in meeting the short time-frames of controlling the onset/offset of the voice in continuous speech. PMID:29466455
Is there an effect of dysphonic teachers' voices on children's processing of spoken language?

PubMed

Rogerson, Jemma; Dodd, Barbara

2005-03-01

There is a vast body of literature on the causes, prevalence, implications, and issues of vocal dysfunction in teachers. However, the educational effect of teacher vocal impairment is largely unknown. The purpose of this study was to investigate the effect of impaired voice quality on children's processing of spoken language. One hundred and seven children (age range, 9.2 to 10.6, mean 9.8, SD 3.76 months) listened to three video passages, one read in a control voice, one in a mild dysphonic voice, and one in a severe dysphonic voice. After each video passage, children were asked to answer six questions, with multiple-choice answers. The results indicated that children's perceptions of speech across the three voice qualities differed, regardless of gender, IQ, and school attended. Performance in the control voice passages was better than performance in the mild and severe dysphonic voice passages. No difference was found between performance in the mild and severe dysphonic voice passages, highlighting that any form of vocal impairment is detrimental to children's speech processing and is therefore likely to have a negative educational effect. These findings, in light of the high rate of vocal dysfunction in teachers, further support the implementation of specific voice care education for those in the teaching profession.
Validation of the Acoustic Voice Quality Index Version 03.01 and the Acoustic Breathiness Index in the Spanish language.

PubMed

Delgado Hernández, Jonathan; León Gómez, Nieves M; Jiménez, Alejandra; Izquierdo, Laura M; Barsties V Latoszek, Ben

2018-05-01

The aim of this study was to validate the Acoustic Voice Quality Index 03.01 (AVQIv3) and the Acoustic Breathiness Index (ABI) in the Spanish language. Concatenated voice samples of continuous speech (cs) and sustained vowel (sv) from 136 subjects with dysphonia and 47 vocally healthy subjects were perceptually judged for overall voice quality and breathiness severity. First, to reach a higher level of ecological validity, the proportions of cs and sv were equalized regarding the time length of 3 seconds sv part and voiced cs part, respectively. Second, concurrent validity and diagnostic accuracy were verified. A moderate reliability of overall voice quality and breathiness severity from 5 experts was used. It was found that 33 syllables as standardization of the cs part, which represents 3 seconds of voiced cs, allows the equalization of both speech tasks. A strong correlation was revealed between AVQIv3 and overall voice quality and ABI and perceived breathiness severity. Additionally, the best diagnostic outcome was identified at a threshold of 2.28 and 3.40 for AVQIv3 and ABI, respectively. The AVQIv3 and ABI showed in the Spanish language valid and robust results to quantify abnormal voice qualities regarding overall voice quality and breathiness severity.
Influence of Telecommunication Modality, Internet Transmission Quality, and Accessories on Speech Perception in Cochlear Implant Users

PubMed Central

Koller, Roger; Guignard, Jérémie; Caversaccio, Marco; Kompis, Martin; Senn, Pascal

2017-01-01

Background Telecommunication is limited or even impossible for more than one-thirds of all cochlear implant (CI) users. Objective We sought therefore to study the impact of voice quality on speech perception with voice over Internet protocol (VoIP) under real and adverse network conditions. Methods Telephone speech perception was assessed in 19 CI users (15-69 years, average 42 years), using the German HSM (Hochmair-Schulz-Moser) sentence test comparing Skype and conventional telephone (public switched telephone networks, PSTN) transmission using a personal computer (PC) and a digital enhanced cordless telecommunications (DECT) telephone dual device. Five different Internet transmission quality modes and four accessories (PC speakers, headphones, 3.5 mm jack audio cable, and induction loop) were compared. As a secondary outcome, the subjective perceived voice quality was assessed using the mean opinion score (MOS). Results Speech telephone perception was significantly better (median 91.6%, P<.001) with Skype compared with PSTN (median 42.5%) under optimal conditions. Skype calls under adverse network conditions (data packet loss > 15%) were not superior to conventional telephony. In addition, there were no significant differences between the tested accessories (P>.05) using a PC. Coupling a Skype DECT phone device with an audio cable to the CI, however, resulted in higher speech perception (median 65%) and subjective MOS scores (3.2) than using PSTN (median 7.5%, P<.001). Conclusions Skype calls significantly improve speech perception for CI users compared with conventional telephony under real network conditions. Listening accessories do not further improve listening experience. Current Skype DECT telephone devices do not fully offer technical advantages in voice quality. PMID:28438727
Influence of Telecommunication Modality, Internet Transmission Quality, and Accessories on Speech Perception in Cochlear Implant Users.

PubMed

Mantokoudis, Georgios; Koller, Roger; Guignard, Jérémie; Caversaccio, Marco; Kompis, Martin; Senn, Pascal

2017-04-24

Telecommunication is limited or even impossible for more than one-thirds of all cochlear implant (CI) users. We sought therefore to study the impact of voice quality on speech perception with voice over Internet protocol (VoIP) under real and adverse network conditions. Telephone speech perception was assessed in 19 CI users (15-69 years, average 42 years), using the German HSM (Hochmair-Schulz-Moser) sentence test comparing Skype and conventional telephone (public switched telephone networks, PSTN) transmission using a personal computer (PC) and a digital enhanced cordless telecommunications (DECT) telephone dual device. Five different Internet transmission quality modes and four accessories (PC speakers, headphones, 3.5 mm jack audio cable, and induction loop) were compared. As a secondary outcome, the subjective perceived voice quality was assessed using the mean opinion score (MOS). Speech telephone perception was significantly better (median 91.6%, P<.001) with Skype compared with PSTN (median 42.5%) under optimal conditions. Skype calls under adverse network conditions (data packet loss > 15%) were not superior to conventional telephony. In addition, there were no significant differences between the tested accessories (P>.05) using a PC. Coupling a Skype DECT phone device with an audio cable to the CI, however, resulted in higher speech perception (median 65%) and subjective MOS scores (3.2) than using PSTN (median 7.5%, P<.001). Skype calls significantly improve speech perception for CI users compared with conventional telephony under real network conditions. Listening accessories do not further improve listening experience. Current Skype DECT telephone devices do not fully offer technical advantages in voice quality. ©Georgios Mantokoudis, Roger Koller, Jérémie Guignard, Marco Caversaccio, Martin Kompis, Pascal Senn. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 24.04.2017.
Multimodal Speech Capture System for Speech Rehabilitation and Learning.

PubMed

Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam

2017-11-01

Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.
Dramatic Effects of Speech Task on Motor and Linguistic Planning in Severely Dysfluent Parkinsonian Speech

ERIC Educational Resources Information Center

Van Lancker Sidtis, Diana; Cameron, Krista; Sidtis, John J.

2012-01-01

In motor speech disorders, dysarthric features impacting intelligibility, articulation, fluency and voice emerge more saliently in conversation than in repetition, reading or singing. A role of the basal ganglia in these task discrepancies has been identified. Further, more recent studies of naturalistic speech in basal ganglia dysfunction have…
Perceptual and Acoustic Reliability Estimates for the Speech Disorders Classification System (SDCS)

ERIC Educational Resources Information Center

Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

2010-01-01

A companion paper describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). The SDCS uses perceptual and acoustic data reduction methods to obtain information on a speaker's speech, prosody, and voice. The present paper provides reliability estimates for…
Influence of Security Mechanisms on the Quality of Service of VoIP

NASA Astrophysics Data System (ADS)

Backs, Peter; Pohlmann, Norbert

While Voice over IP (VoIP) is advancing rapidly in the telecommunications market, the interest to protect the data transmitted by this new service is also rising. However, in contrast to other internet services such as email or HTTP, VoIP is real-time media, and therefore must meet a special requirement referred to as Quality-of-Service to provide a comfortable flow of speech. Speech quality is worsened when transmitted over the network due to delays in transmission or loss of packets. Often, voice quality is at a level that even prevents comprehensive dialog. Therefore, an administrator who is to setup a VoIP infrastructure might consider avoiding additional decreases in voice quality resulting from security mechanisms, and might leave internet telephony unprotected as a result. The inspiration for this paper is to illustrate that security mechanisms have negligible impact on speech quality and should in fact be encouraged.
Effects of vocal training and phonatory task on voice onset time.

PubMed

McCrea, Christopher R; Morris, Richard J

2007-01-01

The purpose of this study was to examine the temporal-acoustic differences between trained singers and nonsingers during speech and singing tasks. Thirty male participants were separated into two groups of 15 according to level of vocal training (ie, trained or untrained). The participants spoke and sang carrier phrases containing English voiced and voiceless bilabial stops, and voice onset time (VOT) was measured for the stop consonant productions. Mixed analyses of variance revealed a significant main effect between speech and singing for /p/ and /b/, with VOT durations longer during speech than singing for /p/, and the opposite true for /b/. Furthermore, a significant phonatory task by vocal training interaction was observed for /p/ productions. The results indicated that the type of phonatory task influences VOT and that these influences are most obvious in trained singers secondary to the articulatory and phonatory adjustments learned during vocal training.
Automatic intelligibility classification of sentence-level pathological speech

PubMed Central

Kim, Jangwon; Kumar, Naveen; Tsiartas, Andreas; Li, Ming; Narayanan, Shrikanth S.

2014-01-01

Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. Although automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist experts in diagnosis and treatment design, the many sources and types of variability often make it a very challenging computational processing problem. In this work we propose novel sentence-level features to capture abnormal variation in the prosodic, voice quality and pronunciation aspects in pathological speech. In addition, we propose a post-classification posterior smoothing scheme which refines the posterior of a test sample based on the posteriors of other test samples. Finally, we perform feature-level fusions and subsystem decision fusion for arriving at a final intelligibility decision. The performances are tested on two pathological speech datasets, the NKI CCRT Speech Corpus (advanced head and neck cancer) and the TORGO database (cerebral palsy or amyotrophic lateral sclerosis), by evaluating classification accuracy without overlapping subjects’ data among training and test partitions. Results show that the feature sets of each of the voice quality subsystem, prosodic subsystem, and pronunciation subsystem, offer significant discriminating power for binary intelligibility classification. We observe that the proposed posterior smoothing in the acoustic space can further reduce classification errors. The smoothed posterior score fusion of subsystems shows the best classification performance (73.5% for unweighted, and 72.8% for weighted, average recalls of the binary classes). PMID:25414544
Recognizing emotional speech in Persian: a validated database of Persian emotional speech (Persian ESD).

PubMed

Keshtiari, Niloofar; Kuhlmann, Michael; Eslami, Moharram; Klann-Delius, Gisela

2015-03-01

Research on emotional speech often requires valid stimuli for assessing perceived emotion through prosody and lexical content. To date, no comprehensive emotional speech database for Persian is officially available. The present article reports the process of designing, compiling, and evaluating a comprehensive emotional speech database for colloquial Persian. The database contains a set of 90 validated novel Persian sentences classified in five basic emotional categories (anger, disgust, fear, happiness, and sadness), as well as a neutral category. These sentences were validated in two experiments by a group of 1,126 native Persian speakers. The sentences were articulated by two native Persian speakers (one male, one female) in three conditions: (1) congruent (emotional lexical content articulated in a congruent emotional voice), (2) incongruent (neutral sentences articulated in an emotional voice), and (3) baseline (all emotional and neutral sentences articulated in neutral voice). The speech materials comprise about 470 sentences. The validity of the database was evaluated by a group of 34 native speakers in a perception test. Utterances recognized better than five times chance performance (71.4 %) were regarded as valid portrayals of the target emotions. Acoustic analysis of the valid emotional utterances revealed differences in pitch, intensity, and duration, attributes that may help listeners to correctly classify the intended emotion. The database is designed to be used as a reliable material source (for both text and speech) in future cross-cultural or cross-linguistic studies of emotional speech, and it is available for academic research purposes free of charge. To access the database, please contact the first author.
Apraxia of Speech

MedlinePlus

... MD 20892-3456 Toll-free voice: (800) 241-1044 Toll-free TTY: (800) 241-1055 Email: nidcdinfo@ ... questions in English or Spanish. Voice: (800) 241-1044 TTY: (800) 241-1055 nidcdinfo@nidcd.nih.gov ...
Analysis and enhancement of country singing

NASA Astrophysics Data System (ADS)

Lee, Matthew; Smith, Mark J. T.

2003-04-01

The study of human singing has focused extensively on the analysis of voice characteristics. At the same time, a substantial body of work has been under study aimed at modeling and synthesizing the human voice. The work on which we report brings together some key analysis and synthesis principles to create a new model for digitally improving the perceived quality of an average singing voice. The model presented employs an analysis-by-synthesis overlap-add (ABS-OLA) sinusoidal model, which in the past has been used for the analysis and synthesis of speech, in combination with a spectral model of the vocal tract. The ABS-OLA sinusoidal model for speech has been shown to be a flexible, accurate, and computationally efficient representation capable of producing a natural-sounding singing voice [E. B. George and M. J. T. Smith, Trans. Speech Audio Processing 5, 389-406 (1997)]. A spectral model infused in the ABS-OLA uses Generalized Gaussian functions to provide a simple framework which enables the precise modification of spectral characteristics while maintaining the quality and naturalness of the original voice. Furthermore, it is shown that the parameters of the new ABS-OLA can accommodate pitch corrections and vocal quality enhancements while preserving naturalness and singer identity. Examples of enhanced country singing will be presented.
Evidence-Based Clinical Voice Assessment: A Systematic Review

ERIC Educational Resources Information Center

Roy, Nelson; Barkmeier-Kraemer, Julie; Eadie, Tanya; Sivasankar, M. Preeti; Mehta, Daryush; Paul, Diane; Hillman, Robert

2013-01-01

Purpose: To determine what research evidence exists to support the use of voice measures in the clinical assessment of patients with voice disorders. Method: The American Speech-Language-Hearing Association (ASHA) National Center for Evidence-Based Practice in Communication Disorders staff searched 29 databases for peer-reviewed English-language…
Arguments against the Aggressive Pursuit of Voice Therapy for Children.

ERIC Educational Resources Information Center

Sander, Eric K.

1989-01-01

A less aggressive treatment strategy is proposed in the area of children's voice disorders. Speech clinicians are urged not to be over-zealous in imposition of their own voice standards. The potential threat that vocal pathologies hold for children's larynges is felt to be largely over-rated. (Author/JDD)
Effects of vocal training on singing and speaking voice characteristics in vocally healthy adults and children based on choral and nonchoral data.

PubMed

Siupsinskiene, Nora; Lycke, Hugo

2011-07-01

This prospective cross-sectional study examines the effects of voice training on vocal capabilities in vocally healthy age and gender differentiated groups measured by voice range profile (VRP) and speech range profile (SRP). Frequency and intensity measurements of the VRP and SRP using standard singing and speaking voice protocols were derived from 161 trained choir singers (21 males, 59 females, and 81 prepubescent children) and from 188 nonsingers (38 males, 89 females, and 61 children). When compared with nonsingers, both genders of trained adult and child singers exhibited increased mean pitch range, highest frequency, and VRP area in high frequencies (P<0.05). Female singers and child singers also showed significantly increased mean maximum voice intensity, intensity range, and total VRP area. The logistic regression analysis showed that VRP pitch range, highest frequency, maximum voice intensity, and maximum-minimum intensity range, and SRP slope of speaking curve were the key predictors of voice training. Age, gender, and voice training differentiated norms of VRP and SRP parameters are presented. Significant positive effect of voice training on vocal capabilities, mostly singing voice, was confirmed. The presented norms for trained singers, with key parameters differentiated by gender and age, are suggested for clinical practice of otolaryngologists and speech-language pathologists. Copyright © 2011 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Movement of the velum during speech and singing in classically trained singers.

PubMed

Austin, S F

1997-06-01

The present study addresses two questions: (a) Is the action and/or posture of the velopharyngeal valve conducive to allow significant resonance during Western tradition classical singing? (b) How do the actions of the velopharyngeal valve observed in this style of singing compare with normal speech? A photodetector system was used to observe the area function of the velopharyngeal port during speech and classical style singing. Identical speech samples were produced by each subject in a normal speaking voice and then in the low, medium, and high singing ranges. Results indicate that in these four singers the velopharyngeal port was closed significantly longer in singing than in speaking samples. The amount of time the velopharyngeal port was opened was greatest in speech and diminished as the singer ascended in pitch. In the high voice condition, little or no opening of the velopharyngeal port was measured.
V2S: Voice to Sign Language Translation System for Malaysian Deaf People

NASA Astrophysics Data System (ADS)

Mean Foong, Oi; Low, Tang Jung; La, Wai Wan

The process of learning and understand the sign language may be cumbersome to some, and therefore, this paper proposes a solution to this problem by providing a voice (English Language) to sign language translation system using Speech and Image processing technique. Speech processing which includes Speech Recognition is the study of recognizing the words being spoken, regardless of whom the speaker is. This project uses template-based recognition as the main approach in which the V2S system first needs to be trained with speech pattern based on some generic spectral parameter set. These spectral parameter set will then be stored as template in a database. The system will perform the recognition process through matching the parameter set of the input speech with the stored templates to finally display the sign language in video format. Empirical results show that the system has 80.3% recognition rate.

Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech

NASA Astrophysics Data System (ADS)

Přibil, J.; Přibilová, A.

2009-01-01

The paper addresses reflection of microintonation and spectral properties in male and female acted emotional speech. Microintonation component of speech melody is analyzed regarding its spectral and statistical parameters. According to psychological research of emotional speech, different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of cepstral coefficient values and ranges of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Histograms of cepstral coefficient distribution are evaluated and compared using skewness and kurtosis. Achieved statistical results show good correlation comparing male and female voices for all emotional states portrayed by several Czech and Slovak professional actors.
Assessment of breathing patterns and respiratory muscle recruitment during singing and speech in quadriplegia.

PubMed

Tamplin, Jeanette; Brazzale, Danny J; Pretto, Jeffrey J; Ruehland, Warren R; Buttifant, Mary; Brown, Douglas J; Berlowitz, David J

2011-02-01

To explore how respiratory impairment after cervical spinal cord injury affects vocal function, and to explore muscle recruitment strategies used during vocal tasks after quadriplegia. It was hypothesized that to achieve the increased respiratory support required for singing and loud speech, people with quadriplegia use different patterns of muscle recruitment and control strategies compared with control subjects without spinal cord injury. Matched, parallel-group design. Large university-affiliated public hospital. Consenting participants with motor-complete C5-7 quadriplegia (n=6) and able-bodied age-matched controls (n=6) were assessed on physiologic and voice measures during vocal tasks. Not applicable. Standard respiratory function testing, surface electromyographic activity from accessory respiratory muscles, sound pressure levels during vocal tasks, the Voice Handicap Index, and the Perceptual Voice Profile. The group with quadriplegia had a reduced lung capacity (vital capacity, 71% vs 102% of predicted; P=.028), more perceived voice problems (Voice Handicap Index score, 22.5 vs 6.5; P=.046), and greater recruitment of accessory respiratory muscles during both loud and soft volumes (P=.028) than the able-bodied controls. The group with quadriplegia also demonstrated higher accessory muscle activation in changing from soft to loud speech (P=.028). People with quadriplegia have impaired vocal ability and use different muscle recruitment strategies during speech than the able-bodied. These findings will enable us to target specific measurements of respiratory physiology for assessing functional improvements in response to formal therapeutic singing training. Copyright © 2011 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Group climate in the voice therapy of patients with Parkinson's Disease.

PubMed

Diaféria, Giovana; Madazio, Glaucya; Pacheco, Claudia; Takaki, Patricia Barbarini; Behlau, Mara

2017-09-04

To verify the impact that group dynamics and coaching strategies have on the PD patients voice, speech and communication, as well as the group climate. 16 individuals with mild to moderate dysarthria due to the PD were divided into two groups: the CG (8 patients), submitted to traditional therapy with 12 regular therapy sessions plus 4 additional support sessions; and the EG (8 patients), submitted to traditional therapy with 12 regular therapy sessions plus 4 sessions with group dynamics and coaching strategies. The Living with Dysarthria questionnaire (LwD), the self-evaluation of voice, speech and communication, and the perceptual-auditory analysis of the vocal quality were assess in 3 moments: pre-traditional therapy (pre); post-traditional therapy (post 1); and post support sessions/coaching strategies (post 2); in post 1 and post 2 moments, the Group Climate Questionnaire (GCQ) was also applied. CG and EG showed an improvement in the LwD from pre to post 1 and post 2 moments. Voice self-evaluation was better for the EG - when pre was compared with post 2 and when post 1 was compared with post 2 - ranging from regular to very good; both groups presented improvement in the communication self-evaluation. The perceptual-auditory evaluation of the vocal quality was better for the EG in the post 1 moment. No difference was found for the GCQ; however, the EG presented lower avoidance scores in post 2. All patients showed improvement in the voice, speech and communication self-evaluation; EG showed lower avoidance scores, creating a more collaborative and propitious environment for speech therapy.
Voice Outcomes of Adults Diagnosed with Pediatric Vocal Fold Nodules and Impact of Speech Therapy.

PubMed

Song, Brian H; Merchant, Maqdooda; Schloegel, Luke

2017-11-01

Objective To evaluate the voice outcomes of adults diagnosed with vocal fold nodules (VFNs) as children and to assess the impact of speech therapy on long-term voice outcomes. Study Design Prospective cohort study. Setting Large health care system. Subjects and Methods Subjects diagnosed with VFNs as children between the years 1996 and 2008 were identified within a medical record database of a large health care system. Included subjects were 3 to 12 years old at the time of diagnosis, had a documented laryngeal examination within 90 days of diagnosis, and were ≥18 years as of December 31, 2014. Qualified subjects were contacted by telephone and administered the Vocal Handicap Index-10 (VHI-10) and a 15-item questionnaire inquiring for confounding factors. Results A total of 155 subjects were included, with a mean age of 21.4 years (range, 18-29). The male:female ratio was 2.3:1. Mean VHI-10 score for the entire cohort was 5.4. Mean VHI-10 scores did not differ between those who received speech therapy (6.1) and those who did not (4.5; P = .08). Both groups were similar with respect to confounding risk factors that can contribute to dysphonia, although the no-therapy group had a disproportionately higher number of subjects who consumed >10 alcoholic drinks per week ( P = .01). Conclusion The majority of adults with VFNs as children will achieve a close-to-normal voice quality when they reach adulthood. In our cohort, speech therapy did not appear to have an impact on the long-term voice outcomes.
Vocal warm-up and breathing training for teachers: randomized clinical trial

PubMed Central

Pereira, Lílian Paternostro de Pina; Masson, Maria Lúcia Vaz; Carvalho, Fernando Martins

2015-01-01

OBJECTIVE To compare the effectiveness of two speech therapy interventions, vocal warm-up and breathing training, focusing on teachers’ voice quality. METHODS A single-blind, randomized, parallel clinical trial was conducted. The research included 31 20 to 60-year old teachers from a public school in Salvador, BA, Northeasatern Brazil, with minimum workloads of 20 hours a week, who have or have not reported having vocal alterations. The exclusion criteria were the following: being a smoker, excessive alcohol consumption, receiving additional speech therapy assistance while taking part in the study, being affected by upper respiratory tract infections, professional use of the voice in another activity, neurological disorders, and history of cardiopulmonary pathologies. The subjects were distributed through simple randomization in groups vocal warm-up (n = 14) and breathing training (n = 17). The teachers’ voice quality was subjectively evaluated through the Voice Handicap Index (Índice de Desvantagem Vocal, in the Brazilian version) and computerized voice analysis (average fundamental frequency, jitter, shimmer, noise, and glottal-to-noise excitation ratio) by speech therapists. RESULTS Before the interventions, the groups were similar regarding sociodemographic characteristics, teaching activities, and vocal quality. The variations before and after the intervention in self-assessment and acoustic voice indicators have not significantly differed between the groups. In the comparison between groups before and after the six-week interventions, significant reductions in the Voice Handicap Index of subjects in both groups were observed, as wells as reduced average fundamental frequencies in the vocal warm-up group and increased shimmer in the breathing training group. Subjects from the vocal warm-up group reported speaking more easily and having their voices more improved in a general way as compared to the breathing training group. CONCLUSIONS Both interventions were similar regarding their effects on the teachers’ voice quality. However, each contribution has individually contributed to improve the teachers’ voice quality, especially the vocal warm-up. PMID:26465664
When Infants Talk, Infants Listen: Pre-Babbling Infants Prefer Listening to Speech with Infant Vocal Properties

ERIC Educational Resources Information Center

Masapollo, Matthew; Polka, Linda; Ménard, Lucie

2016-01-01

To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre-babbling infants (at 4-6 months) prefer listening to…
Transitioning from analog to digital audio recording in childhood speech sound disorders.

PubMed

Shriberg, Lawrence D; McSweeny, Jane L; Anderson, Bruce E; Campbell, Thomas F; Chial, Michael R; Green, Jordan R; Hauner, Katherina K; Moore, Christopher A; Rusiewicz, Heather L; Wilson, David L

2005-06-01

Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants' speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise.
Transitioning from analog to digital audio recording in childhood speech sound disorders

PubMed Central

Shriberg, Lawrence D.; McSweeny, Jane L.; Anderson, Bruce E.; Campbell, Thomas F.; Chial, Michael R.; Green, Jordan R.; Hauner, Katherina K.; Moore, Christopher A.; Rusiewicz, Heather L.; Wilson, David L.

2014-01-01

Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants’ speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise. PMID:16019779
Speech-recognition interfaces for music information retrieval

NASA Astrophysics Data System (ADS)

Goto, Masataka

2005-09-01

This paper describes two hands-free music information retrieval (MIR) systems that enable a user to retrieve and play back a musical piece by saying its title or the artist's name. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. Our MIR-based jukebox systems employ two different speech-recognition interfaces for MIR, speech completion and speech spotter, which exploit intentionally controlled nonverbal speech information in original ways. The first is a music retrieval system with the speech-completion interface that is suitable for music stores and car-driving situations. When a user only remembers part of the name of a musical piece or an artist and utters only a remembered fragment, the system helps the user recall and enter the name by completing the fragment. The second is a background-music playback system with the speech-spotter interface that can enrich human-human conversation. When a user is talking to another person, the system allows the user to enter voice commands for music playback control by spotting a special voice-command utterance in face-to-face or telephone conversations. Experimental results from use of these systems have demonstrated the effectiveness of the speech-completion and speech-spotter interfaces. (Video clips: http://staff.aist.go.jp/m.goto/MIR/speech-if.html)
Research and development of a versatile portable speech prosthesis

NASA Technical Reports Server (NTRS)

1981-01-01

The Versatile Portable Speech Prosthesis (VPSP), a synthetic speech output communication aid for non-speaking people is described. It was intended initially for severely physically limited people with cerebral palsy who are in electric wheelchairs. Hence, it was designed to be placed on a wheelchair and powered from a wheelchair battery. It can easily be separated from the wheelchair. The VPSP is versatile because it is designed to accept any means of single switch, multiple switch, or keyboard control which physically limited people have the ability to use. It is portable because it is mounted on and can go with the electric wheelchair. It is a speech prosthesis, obviously, because it speaks with a synthetic voice for people unable to speak with their own voices. Both hardware and software are described.
Effects of Familiarity and Feeding on Newborn Speech-Voice Recognition

ERIC Educational Resources Information Center

Valiante, A. Grace; Barr, Ronald G.; Zelazo, Philip R.; Brant, Rollin; Young, Simon N.

2013-01-01

Newborn infants preferentially orient to familiar over unfamiliar speech sounds. They are also better at remembering unfamiliar speech sounds for short periods of time if learning and retention occur after a feed than before. It is unknown whether short-term memory for speech is enhanced when the sound is familiar (versus unfamiliar) and, if so,…
How Our Own Speech Rate Influences Our Perception of Others

ERIC Educational Resources Information Center

Bosker, Hans Rutger

2017-01-01

In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects…
The Hypothesis of Apraxia of Speech in Children with Autism Spectrum Disorder

ERIC Educational Resources Information Center

Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.

2011-01-01

In a sample of 46 children aged 4-7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants' speech, prosody, and voice were compared with data from 40 typically-developing children, 13…
Thermal welding vs. cold knife tonsillectomy: a comparison of voice and speech.

PubMed

Celebi, Saban; Yelken, Kursat; Celik, Oner; Taskin, Umit; Topak, Murat

2011-01-01

To compare acoustic, aerodynamic and perceptual voice and speech parameters in thermal welding system tonsillectomy and cold knife tonsillectomy patients in order to determine the impact of operation technique on voice and speech. Thirty tonsillectomy patients (22 children, 8 adults) participated in this study. The preferred technique was cold knife tonsillectomy in 15 patients and thermal welding system tonsillectomy in the remaining 15 patients. One week before and 1 month after surgery the following parameters were estimated: average of fundamental frequency, Jitter, Shimmer, harmonic to noise ratio, formant frequency analyses of sustained vowels. Perceptual speech analysis and aerodynamic measurements (maximum phonation time and s/z ratio) were also conducted. There was no significant difference in any of the parameters between cold knife tonsillectomy and thermal welding system tonsillectomy groups (p>0.05). When the groups were contrasted among themselves with regards to preoperative and postoperative rates, fundamental frequency was found to be significantly decreased after tonsillectomy in both of the groups (p<0.001). First formant for the vowel /a/ in the cold knife tonsillectomy group and for the vowel /i/ in the thermal welding system tonsillectomy group, second formant for the vowel /u/ in the thermal welding system tonsillectomy group and third formant for the vowel /u/ in the cold knife tonsillectomy group were found to be significantly decreased (p<0.05). The surgical technique, whether it is cold knife or thermal welding system, does not appear to affect voice and speech in tonsillectomy patients. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Reliability, stability, and sensitivity to change and impairment in acoustic measures of timing and frequency.

PubMed

Vogel, Adam P; Fletcher, Janet; Snyder, Peter J; Fredrickson, Amy; Maruff, Paul

2011-03-01

Assessment of the voice for supporting classifications of central nervous system (CNS) impairment requires a different practical, methodological, and statistical framework compared with assessment of the voice to guide decisions about change in the CNS. In experimental terms, an understanding of the stability and sensitivity to change of an assessment protocol is required to guide decisions about CNS change. Five experiments (N = 70) were conducted using a set of commonly used stimuli (eg, sustained vowel, reading, extemporaneous speech) and easily acquired measures (eg, f₀-f₄, percent pause). Stability of these measures was examined through their repeated application in healthy adults over brief and intermediate retest intervals (ie, 30 seconds, 2 hours, and 1 week). Those measures found to be stable were then challenged using an experimental model that reliably changes voice acoustic properties (ie, the Lombard effect). Finally, adults with an established CNS-related motor speech disorder (dysarthria) were compared with healthy controls. Of the 61 acoustic variables studied, 36 showed good stability over all three stability experiments (eg, number of pauses, total speech time, speech rate, f₀-f₄. Of the measures with good stability, a number of frequency measures showed a change in response to increased vocal effort resulting from the Lombard effect challenge. Furthermore, several timing measures significantly separated the control and motor speech impairment groups. Measures with high levels of stability within healthy adults, and those that show sensitivity to change and impairment may prove effective for monitoring changes in CNS functioning. Copyright © 2011 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Selective attention to human voice enhances brain activity bilaterally in the superior temporal sulcus.

PubMed

Alho, Kimmo; Vorobyev, Victor A; Medvedev, Svyatoslav V; Pakhomov, Sergey V; Starchenko, Maria G; Tervaniemi, Mari; Näätänen, Risto

2006-02-23

Regional cerebral blood flow was measured with positron emission tomography (PET) in 10 healthy male volunteers. They heard two binaurally delivered concurrent stories, one spoken by a male voice and the other by a female voice. A third story was presented at the same time as a text running on a screen. The subjects were instructed to attend silently to one of the stories at a time. In an additional resting condition, no stories were delivered. PET data showed that in comparison with the reading condition, the brain activity in the speech-listening conditions was enhanced bilaterally in the anterior superior temporal sulcus including cortical areas that have been reported to be specifically sensitive to human voice. Previous studies on attention to non-linguistic sounds and visual objects, in turn, showed prefrontal activations that are presumably related to attentional control functions. However, comparisons of the present speech-listening and reading conditions with each other or with the resting condition indicated no prefrontal activity, except for an activation in the inferior frontal cortex that was presumably associated with semantic and syntactic processing of the attended story. Thus, speech listening, as well as reading, even in a distracting environment appears to depend less on the prefrontal control functions than do other types of attention-demanding tasks, probably because selective attention to speech and written text are over-learned actions rehearsed daily.
Speech processing using maximum likelihood continuity mapping

DOEpatents

Hogden, John E.

2000-01-01

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Speech processing using maximum likelihood continuity mapping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogden, J.E.

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Separation of Singing Voice from Music Accompaniment for Monaural Recordings

DTIC Science & Technology

2005-09-01

Directory: pub/tech-report/2005 File in pdf format: TR61.pdf Separation of Singing Voice from Music Accompaniment for Monaural Recordings Yipeng Li...Abstract Separating singing voice from music accompaniment is very useful in many applications, such as lyrics recognition and alignment, singer...identification, and music information retrieval. Although speech separation has been extensively studied for decades, singing voice separation has been little
Voice technology and BBN

NASA Technical Reports Server (NTRS)

Wolf, Jared J.

1977-01-01

The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described.

Transcribing nonsense words: The effect of numbers of voices and repetitions.

PubMed

Knight, Rachael-Anne

2010-06-01

Transcription skills are crucially important to all phoneticians, and particularly for speech and language therapists who may use transcriptions to make decisions about diagnosis and intervention. Whilst interest in factors affecting transcription accuracy is increasing, there are still a number of issues that are yet to be investigated. The present paper considers how the number of voices and the number of repetitions affects the transcription of nonsense words. Thirty-two students in their second year of study for a BSc in Speech and Language Therapy were participants in an experiment. They heard two nonsense words presented 10 times in either one or two voices. Results show that the number of voices did not affect accuracy, but that accuracy increased between six and ten repetitions. The reasons behind these findings, and implications for teaching and learning, and further research are discussed.
Audiovisual speech facilitates voice learning.

PubMed

Sheffert, Sonya M; Olson, Elizabeth

2004-02-01

In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.
Remote voice training: A case study on space shuttle applications, appendix C

NASA Technical Reports Server (NTRS)

Mollakarimi, Cindy; Hamid, Tamin

1990-01-01

The Tile Automation System includes applications of automation and robotics technology to all aspects of the Shuttle tile processing and inspection system. An integrated set of rapid prototyping testbeds was developed which include speech recognition and synthesis, laser imaging systems, distributed Ada programming environments, distributed relational data base architectures, distributed computer network architectures, multi-media workbenches, and human factors considerations. Remote voice training in the Tile Automation System is discussed. The user is prompted over a headset by synthesized speech for the training sequences. The voice recognition units and the voice output units are remote from the user and are connected by Ethernet to the main computer system. A supervisory channel is used to monitor the training sequences. Discussions include the training approaches as well as the human factors problems and solutions for this system utilizing remote training techniques.
Voice interactive electronic warning systems (VIEWS) - An applied approach to voice technology in the helicopter cockpit

NASA Technical Reports Server (NTRS)

Voorhees, J. W.; Bucher, N. M.

1983-01-01

The cockpit has been one of the most rapidly changing areas of new aircraft design over the past thirty years. In connection with these developments, a pilot can now be considered a decision maker/system manager as well as a vehicle controller. There is, however, a trend towards an information overload in the cockpit, and information processing problems begin to occur for the rotorcraft pilot. One approach to overcome the arising difficulties is based on the utilization of voice technology to improve the information transfer rate in the cockpit with respect to both input and output. Attention is given to the background of speech technology, the application of speech technology within the cockpit, voice interactive electronic warning system (VIEWS) simulation, and methodology. Information subsystems are considered along with a dynamic simulation study, and data collection.
The Human Voice in Speech and Singing

NASA Astrophysics Data System (ADS)

Lindblom, Björn; Sundberg, Johan

This chapter speech describes various aspects of the human voice as a means of communication in speech and singing. From the point of view of function, vocal sounds can be regarded as the end result of a three stage process: (1) the compression of air in the respiratory system, which produces an exhalatory airstream, (2) the vibrating vocal folds' transformation of this air stream to an intermittent or pulsating air stream, which is a complex tone, referred to as the voice source, and (3) the filtering of this complex tone in the vocal tract resonator. The main function of the respiratory system is to generate an overpressure of air under the glottis, or a subglottal pressure. Section 16.1 describes different aspects of the respiratory system of significance to speech and singing, including lung volume ranges, subglottal pressures, and how this pressure is affected by the ever-varying recoil forces. The complex tone generated when the air stream from the lungs passes the vibrating vocal folds can be varied in at least three dimensions: fundamental frequency, amplitude and spectrum. Section 16.2 describes how these properties of the voice source are affected by the subglottal pressure, the length and stiffness of the vocal folds and how firmly the vocal folds are adducted. Section 16.3 gives an account of the vocal tract filter, how its form determines the frequencies of its resonances, and Sect. 16.4 gives an account for how these resonance frequencies or formants shape the vocal sounds by imposing spectrum peaks separated by spectrum valleys, and how the frequencies of these peaks determine vowel and voice qualities. The remaining sections of the chapter describe various aspects of the acoustic signals used for vocal communication in speech and singing. The syllable structure is discussed in Sect. 16.5, the closely related aspects of rhythmicity and timing in speech and singing is described in Sect. 16.6, and pitch and rhythm aspects in Sect. 16.7. The impressive control of all these acoustic characteristics of vocal signals is discussed in Sect. 16.8, while Sect. 16.9 considers expressive aspects of vocal communication.
Lee Silverman Voice Treatment versus standard speech and language therapy versus control in Parkinson's disease: a pilot randomised controlled trial (PD COMM pilot).

PubMed

Sackley, Catherine M; Smith, Christina H; Rick, Caroline E; Brady, Marian C; Ives, Natalie; Patel, Smitaa; Woolley, Rebecca; Dowling, Francis; Patel, Ramilla; Roberts, Helen; Jowett, Sue; Wheatley, Keith; Kelly, Debbie; Sands, Gina; Clarke, Carl E

2018-01-01

Speech-related problems are common in Parkinson's disease (PD), but there is little evidence for the effectiveness of standard speech and language therapy (SLT) or Lee Silverman Voice Treatment (LSVT LOUD®). The PD COMM pilot was a three-arm, assessor-blinded, randomised controlled trial (RCT) of LSVT LOUD®, SLT and no intervention (1:1:1 ratio) to assess the feasibility and to inform the design of a full-scale RCT. Non-demented patients with idiopathic PD and speech problems and no SLT for speech problems in the past 2 years were eligible. LSVT LOUD® is a standardised regime (16 sessions over 4 weeks). SLT comprised individualised content per local practice (typically weekly sessions for 6-8 weeks). Outcomes included recruitment and retention, treatment adherence, and data completeness. Outcome data collected at baseline, 3, 6, and 12 months included patient-reported voice and quality of life measures, resource use, and assessor-rated speech recordings. Eighty-nine patients were randomised with 90% in the therapy groups and 100% in the control group completing the trial. The response rate for Voice Handicap Index (VHI) in each arm was ≥ 90% at all time-points. VHI was highly correlated with the other speech-related outcome measures. There was a trend to improvement in VHI with LSVT LOUD® (difference at 3 months compared with control: - 12.5 points; 95% CI - 26.2, 1.2) and SLT (difference at 3 months compared with control: - 9.8 points; 95% CI - 23.2, 3.7) which needs to be confirmed in an adequately powered trial. Randomisation to a three-arm trial of speech therapy including a no intervention control is feasible and acceptable. Compliance with both interventions was good. VHI and other patient-reported outcomes were relevant measures and provided data to inform the sample size for a substantive trial. International Standard Randomised Controlled Trial Number Register: ISRCTN75223808. registered 22 March 2012.
The Influence of Native Language on Auditory-Perceptual Evaluation of Vocal Samples Completed by Brazilian and Canadian SLPs.

PubMed

Chaves, Cristiane Ribeiro; Campbell, Melanie; Côrtes Gama, Ana Cristina

2017-03-01

This study aimed to determine the influence of native language on the auditory-perceptual assessment of voice, as completed by Brazilian and Anglo-Canadian listeners using Brazilian vocal samples and the grade, roughness, breathiness, asthenia, strain (GRBAS) scale. This is an analytical, observational, comparative, and transversal study conducted at the Speech Language Pathology Department of the Federal University of Minas Gerais in Brazil, and at the Communication Sciences and Disorders Department of the University of Alberta in Canada. The GRBAS scale, connected speech, and a sustained vowel were used in this study. The vocal samples were drawn randomly from a database of recorded speech of Brazilian adults, some with healthy voices and some with voice disorders. The database is housed at the Federal University of Minas Gerais. Forty-six samples of connected speech (recitation of days of the week), produced by 35 women and 11 men, and 46 samples of the sustained vowel /a/, produced by 37 women and 9 men, were used in this study. The listeners were divided into two groups of three speech therapists, according to nationality: Brazilian or Anglo-Canadian. The groups were matched according to the years of professional experience of participants. The weighted kappa was used to calculate the intra- and inter-rater agreements, with 95% confidence intervals, respectively. An analysis of the intra-rater agreement showed that Brazilians and Canadians had similar results in auditory-perceptual evaluation of sustained vowel and connected speech. The results of the inter-rater agreement of connected speech and sustained vowel indicated that Brazilians and Canadians had, respectively, moderate agreement on the overall severity (0.57 and 0.50), breathiness (0.45 and 0.45), and asthenia (0.50 and 0.46); poor correlation on roughness (0.19 and 0.007); and weak correlation on strain to connected speech (0.22), and moderate correlation to sustained vowel (0.50). In general, auditory-perceptual evaluation is not influenced by the native language on most dimensions of the perceptual parameters of the GRBAS scale. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
SPEECH PERCEPTION AS A TALKER-CONTINGENT PROCESS

PubMed Central

Nygaard, Lynne C.; Sommers, Mitchell S.; Pisoni, David B.

2011-01-01

To determine how familiarity with a talker’s voice affects perception of spoken words, we trained two groups of subjects to recognize a set of voices over a 9-day period. One group then identified novel words produced by the same set of talkers at four signal-to-noise ratios. Control subjects identified the same words produced by a different set of talkers. The results showed that the ability to identify a talker’s voice improved intelligibility of novel words produced by that talker. The results suggest that speech perception may involve talker-contingent processes whereby perceptual learning of aspects of the vocal source facilitates the subsequent phonetic analysis of the acoustic signal. PMID:21526138
Speech Motor Development during Acquisition of the Voicing Contrast

ERIC Educational Resources Information Center

Grigos, Maria I.; Saxman, John H.; Gordon, Andrew M.

2005-01-01

Lip and jaw movements were studied longitudinally in 19-month-old children as they acquired the voicing contrast for /p/ and /b/. A movement tracking system obtained lip and jaw kinematics as participants produced the target utterances /papa/ and /baba/. Laryngeal adjustments were also tracked through acoustically recorded voice onset time (VOT)…
Two-voice fundamental frequency estimation

NASA Astrophysics Data System (ADS)

de Cheveigné, Alain

2002-05-01

An algorithm is presented that estimates the fundamental frequencies of two concurrent voices or instruments. The algorithm models each voice as a periodic function of time, and jointly estimates both periods by cancellation according to a previously proposed method [de Cheveigné and Kawahara, Speech Commun. 27, 175-185 (1999)]. The new algorithm improves on the old in several respects; it allows an unrestricted search range, effectively avoids harmonic and subharmonic errors, is more accurate (it uses two-dimensional parabolic interpolation), and is computationally less costly. It remains subject to unavoidable errors when periods are in certain simple ratios and the task is inherently ambiguous. The algorithm is evaluated on a small database including speech, singing voice, and instrumental sounds. It can be extended in several ways; to decide the number of voices, to handle amplitude variations, and to estimate more than two voices (at the expense of increased processing cost and decreased reliability). It makes no use of instrument models, learned or otherwise, although it could usefully be combined with such models. [Work supported by the Cognitique programme of the French Ministry of Research and Technology.
Segregation of Whispered Speech Interleaved with Noise or Speech Maskers

DTIC Science & Technology

2011-08-01

range over which the talker can be heard. Whispered speech is produced by modulating the flow of air through partially open vocal folds. Because the...source of excitation is turbulent air flow , the acoustic characteristics of whispered speech differs from voiced speech [1, 2]. Despite the acoustic...signals provided by cochlear implants. Two studies investigated the segregation of simultaneously presented whispered vowels [7, 8] in a standard
Electrocorticographic representations of segmental features in continuous speech

PubMed Central

Lotte, Fabien; Brumberg, Jonathan S.; Brunner, Peter; Gunduz, Aysegul; Ritaccio, Anthony L.; Guan, Cuntai; Schalk, Gerwin

2015-01-01

Acoustic speech output results from coordinated articulation of dozens of muscles, bones and cartilages of the vocal mechanism. While we commonly take the fluency and speed of our speech productions for granted, the neural mechanisms facilitating the requisite muscular control are not completely understood. Previous neuroimaging and electrophysiology studies of speech sensorimotor control has typically concentrated on speech sounds (i.e., phonemes, syllables and words) in isolation; sentence-length investigations have largely been used to inform coincident linguistic processing. In this study, we examined the neural representations of segmental features (place and manner of articulation, and voicing status) in the context of fluent, continuous speech production. We used recordings from the cortical surface [electrocorticography (ECoG)] to simultaneously evaluate the spatial topography and temporal dynamics of the neural correlates of speech articulation that may mediate the generation of hypothesized gestural or articulatory scores. We found that the representation of place of articulation involved broad networks of brain regions during all phases of speech production: preparation, execution and monitoring. In contrast, manner of articulation and voicing status were dominated by auditory cortical responses after speech had been initiated. These results provide a new insight into the articulatory and auditory processes underlying speech production in terms of their motor requirements and acoustic correlates. PMID:25759647
Human factors issues associated with the use of speech technology in the cockpit

NASA Technical Reports Server (NTRS)

Kersteen, Z. A.; Damos, D.

1983-01-01

The human factors issues associated with the use of voice technology in the cockpit are summarized. The formulation of the LHX avionics suite is described and the allocation of tasks to voice in the cockpit is discussed. State-of-the-art speech recognition technology is reviewed. Finally, a questionnaire designed to tap pilot opinions concerning the allocation of tasks to voice input and output in the cockpit is presented. This questionnaire was designed to be administered to operational AH-1G Cobra gunship pilots. Half of the questionnaire deals specifically with the AH-1G cockpit and the types of tasks pilots would like to have performed by voice in this existing rotorcraft. The remaining portion of the questionnaire deals with an undefined rotorcraft of the future and is aimed at determining what types of tasks these pilots would like to have performed by voice technology if anything was possible, i.e. if there were no technological constraints.
Muscle Weakness and Speech in Oculopharyngeal Muscular Dystrophy

ERIC Educational Resources Information Center

Neel, Amy T.; Palmer, Phyllis M.; Sprouls, Gwyneth; Morrison, Leslie

2015-01-01

Purpose: We documented speech and voice characteristics associated with oculopharyngeal muscular dystrophy (OPMD). Although it is a rare disease, OPMD offers the opportunity to study the impact of myopathic weakness on speech production in the absence of neurologic deficits in a relatively homogeneous group of speakers. Methods: Twelve individuals…
HEARING, LANGUAGE, AND SPEECH DISORDERS. NINDB RESEARCH PROFILE NUMBER 4.

ERIC Educational Resources Information Center

National Inst. of Neurological Diseases and Blindness (NIH), Bethesda, MD.

AS PART OF HIS ANNUAL STATEMENT TO CONGRESS, THE DIRECTOR OF THE NATIONAL INSTITUTE OF NEUROLOGICAL DISEASES AND BLINDNESS DESCRIBES RESEARCH ACTIVITIES IN SPEECH AND HEARING DISORDERS. THIS REPORT SUMMARIZES INFORMATION CONCERNING THE PREVALENCE AND CAUSES OF COMMUNICATIVE DISORDERS (HEARING, SPEECH, LANGUAGE, VOICE, AND READING) IN CHILDREN AND…
Age and Function Differences in Shared Task Performance: Walking and Talking

ERIC Educational Resources Information Center

Williams, Kathleen; Hinton, Virginia A.; Bories, Tamara; Kovacs, Christopher R.

2006-01-01

Less is known about the effects of normal aging on speech output than other motor actions, because studies of communication integrity have focused on voice production and linguistic parameters rather than speech production characteristics. Studies investigating speech production in older adults have reported increased syllable duration (Slawinski,…
Immediate Effect of Alcohol on Voice Tremor Parameters and Speech Motor Control

ERIC Educational Resources Information Center

Krishnan, Gayathri; Ghosh, Vipin

2017-01-01

The complex neuro-muscular interplay of speech subsystems is susceptible to alcohol intoxication. Published reports have studied language formulation and fundamental frequency measures pre- and post-intoxication. This study aimed at tapping the speech motor control measure using rate, consistency, and accuracy measures of diadochokinesis and…
Voice Modulations in German Ironic Speech

ERIC Educational Resources Information Center

Scharrer, Lisa; Christmann, Ursula; Knoll, Monja

2011-01-01

Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech, and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German "ironic…
Constructing Adequate Non-Speech Analogues: What Is Special about Speech Anyway?

ERIC Educational Resources Information Center

Rosen, Stuart; Iverson, Paul

2007-01-01

Vouloumanos and Werker (2007) claim that human neonates have a (possibly innate) bias to listen to speech based on a preference for natural speech utterances over sine-wave analogues. We argue that this bias more likely arises from the strikingly different saliency of voice melody in the two kinds of sounds, a bias that has already been shown to…
Functional Overlap between Regions Involved in Speech Perception and in Monitoring One's Own Voice during Speech Production

ERIC Educational Resources Information Center

Zheng, Zane Z.; Munhall, Kevin G.; Johnsrude, Ingrid S.

2010-01-01

The fluency and the reliability of speech production suggest a mechanism that links motor commands and sensory feedback. Here, we examined the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or…

Text to Speech (TTS) Capabilities for the Common Driver Trainer (CDT)

DTIC Science & Technology

2010-10-01

harnessing in’leigle jalClpeno jocelyn linu ~ los angeles lottery margarine mathematlze mathematized mathematized meme memes memol...including Julie, Kate, and Paul . Based upon the names of the voices, it may be that the VoiceText capability is the technology being used currently on...DFTTSExportToFileEx(O, " Paul ", 1, 1033, "Testing the Digital Future Text-to-Speech SDK.", -1, -1, -1, -1, -1, DFTTS_ TEXT_ TYPE_ XML, "test.wav", 0, "", -1
A Voice Enabled Procedure Browser for the International Space Station

NASA Technical Reports Server (NTRS)

Rayner, Manny; Chatzichrisafis, Nikos; Hockey, Beth Ann; Farrell, Kim; Renders, Jean-Michel

2005-01-01

Clarissa, an experimental voice enabled procedure browser that has recently been deployed on the International Space Station (ISS), is to the best of our knowledge the first spoken dialog system in space. This paper gives background on the system and the ISS procedures, then discusses the research developed to address three key problems: grammar-based speech recognition using the Regulus toolkit; SVM based methods for open microphone speech recognition; and robust side-effect free dialogue management for handling undos, corrections and confirmations.
[The application of cybernetic modeling methods for the forensic medical personality identification based on the voice and sounding speech characteristics].

PubMed

Kaganov, A Sh; Kir'yanov, P A

2015-01-01

The objective of the present publication was to discuss the possibility of application of cybernetic modeling methods to overcome the apparent discrepancy between two kinds of the speech records, viz. initial ones (e.g. obtained in the course of special investigation activities) and the voice prints obtained from the persons subjected to the criminalistic examination. The paper is based on the literature sources and the materials of original criminalistics expertises performed by the authors.
Human factors research problems in electronic voice warning system design

NASA Technical Reports Server (NTRS)

Simpson, C. A.; Williams, D. H.

1975-01-01

The speech messages issued by voice warning systems must be carefully designed in accordance with general principles of human decision making processes, human speech comprehension, and the conditions in which the warnings can occur. The operator's effectiveness must not be degraded by messages that are either inappropriate or difficult to comprehend. Important experimental variables include message content, linguistic redundancy, signal/noise ratio, interference with concurrent tasks, and listener expectations generated by the pragmatic or real world context in which the messages are presented.
The Prevalence of Speech and Language Disorders in French-Speaking Preschool Children From Yaoundé (Cameroon).

PubMed

Tchoungui Oyono, Lilly; Pascoe, Michelle; Singh, Shajila

2018-05-17

The purpose of this study was to determine the prevalence of speech and language disorders in French-speaking preschool-age children in Yaoundé, the capital city of Cameroon. A total of 460 participants aged 3-5 years were recruited from the 7 communes of Yaoundé using a 2-stage cluster sampling method. Speech and language assessment was undertaken using a standardized speech and language test, the Evaluation du Langage Oral (Khomsi, 2001), which was purposefully renormed on the sample. A predetermined cutoff of 2 SDs below the normative mean was applied to identify articulation, expressive language, and receptive language disorders. Fluency and voice disorders were identified using clinical judgment by a speech-language pathologist. Overall prevalence was calculated as follows: speech disorders, 14.7%; language disorders, 4.3%; and speech and language disorders, 17.1%. In terms of disorders, prevalence findings were as follows: articulation disorders, 3.6%; expressive language disorders, 1.3%; receptive language disorders, 3%; fluency disorders, 8.4%; and voice disorders, 3.6%. Prevalence figures are higher than those reported for other countries and emphasize the urgent need to develop speech and language services for the Cameroonian population.
"Bouba" and "Kiki" in Namibia? A Remote Culture Make Similar Shape-Sound Matches, but Different Shape-Taste Matches to Westerners

ERIC Educational Resources Information Center

Bremner, Andrew J.; Caparos, Serge; Davidoff, Jules; de Fockert, Jan; Linnell, Karina J.; Spence, Charles

2013-01-01

Western participants consistently match certain shapes with particular speech sounds, tastes, and flavours. Here we demonstrate that the "Bouba-Kiki effect", a well-known shape-sound symbolism effect commonly observed in Western participants, is also observable in the Himba of Northern Namibia, a remote population with little exposure to…
Towards Artificial Speech Therapy: A Neural System for Impaired Speech Segmentation.

PubMed

Iliya, Sunday; Neri, Ferrante

2016-09-01

This paper presents a neural system-based technique for segmenting short impaired speech utterances into silent, unvoiced, and voiced sections. Moreover, the proposed technique identifies those points of the (voiced) speech where the spectrum becomes steady. The resulting technique thus aims at detecting that limited section of the speech which contains the information about the potential impairment of the speech. This section is of interest to the speech therapist as it corresponds to the possibly incorrect movements of speech organs (lower lip and tongue with respect to the vocal tract). Two segmentation models to detect and identify the various sections of the disordered (impaired) speech signals have been developed and compared. The first makes use of a combination of four artificial neural networks. The second is based on a support vector machine (SVM). The SVM has been trained by means of an ad hoc nested algorithm whose outer layer is a metaheuristic while the inner layer is a convex optimization algorithm. Several metaheuristics have been tested and compared leading to the conclusion that some variants of the compact differential evolution (CDE) algorithm appears to be well-suited to address this problem. Numerical results show that the SVM model with a radial basis function is capable of effective detection of the portion of speech that is of interest to a therapist. The best performance has been achieved when the system is trained by the nested algorithm whose outer layer is hybrid-population-based/CDE. A population-based approach displays the best performance for the isolation of silence/noise sections, and the detection of unvoiced sections. On the other hand, a compact approach appears to be clearly well-suited to detect the beginning of the steady state of the voiced signal. Both the proposed segmentation models display outperformed two modern segmentation techniques based on Gaussian mixture model and deep learning.
A Development of a System Enables Character Input and PC Operation via Voice for a Physically Disabled Person with a Speech Impediment

NASA Astrophysics Data System (ADS)

Tanioka, Toshimasa; Egashira, Hiroyuki; Takata, Mayumi; Okazaki, Yasuhisa; Watanabe, Kenzi; Kondo, Hiroki

We have designed and implemented a PC operation support system for a physically disabled person with a speech impediment via voice. Voice operation is an effective method for a physically disabled person with involuntary movement of the limbs and the head. We have applied a commercial speech recognition engine to develop our system for practical purposes. Adoption of a commercial engine reduces development cost and will contribute to make our system useful to another speech impediment people. We have customized commercial speech recognition engine so that it can recognize the utterance of a person with a speech impediment. We have restricted the words that the recognition engine recognizes and separated a target words from similar words in pronunciation to avoid misrecognition. Huge number of words registered in commercial speech recognition engines cause frequent misrecognition for speech impediments' utterance, because their utterance is not clear and unstable. We have solved this problem by narrowing the choice of input down in a small number and also by registering their ambiguous pronunciations in addition to the original ones. To realize all character inputs and all PC operation with a small number of words, we have designed multiple input modes with categorized dictionaries and have introduced two-step input in each mode except numeral input to enable correct operation with small number of words. The system we have developed is in practical level. The first author of this paper is physically disabled with a speech impediment. He has been able not only character input into PC but also to operate Windows system smoothly by using this system. He uses this system in his daily life. This paper is written by him with this system. At present, the speech recognition is customized to him. It is, however, possible to customize for other users by changing words and registering new pronunciation according to each user's utterance.
A robotic voice simulator and the interactive training for hearing-impaired people.

PubMed

Sawada, Hideyuki; Kitani, Mitsuki; Hayashi, Yasumori

2008-01-01

A talking and singing robot which adaptively learns the vocalization skill by means of an auditory feedback learning algorithm is being developed. The robot consists of motor-controlled vocal organs such as vocal cords, a vocal tract and a nasal cavity to generate a natural voice imitating a human vocalization. In this study, the robot is applied to the training system of speech articulation for the hearing-impaired, because the robot is able to reproduce their vocalization and to teach them how it is to be improved to generate clear speech. The paper briefly introduces the mechanical construction of the robot and how it autonomously acquires the vocalization skill in the auditory feedback learning by listening to human speech. Then the training system is described, together with the evaluation of the speech training by auditory impaired people.
Alternating motion rate as an index of speech motor disorder in traumatic brain injury.

PubMed

Wang, Yu-Tsai; Kent, Ray D; Duffy, Joseph R; Thomas, Jack E; Weismer, Gary

2004-01-01

The task of syllable alternating motion rate (AMR) (also called diadochokinesis) is suitable for examining speech disorders of varying degrees of severity and in individuals with varying levels of linguistic and cognitive ability. However, very limited information on this task has been published for subjects with traumatic brain injury (TBI). This study is a quantitative and qualitative acoustic analysis of AMR in seven subjects with TBI. The primary goal was to use acoustic analyses to assess speech motor control disturbances for the group as a whole and for individual patients. Quantitative analyses included measures of syllable rate, syllable and intersyllable gap durations, energy maxima, and voice onset time (VOT). Qualitative analyses included classification of features evident in spectrograms and waveforms to provide a more detailed description. The TBI group had (1) a slowed syllable rate due mostly to lengthened syllables and, to a lesser degree, lengthened intersyllable gaps, (2) highly correlated syllable rates between AMR and conversation, (3) temporal and energy maxima irregularities within repetition sequences, (4) normal median VOT values but with large variation, and (5) a number of speech production abnormalities revealed by qualitative analysis, including explosive speech quality, breathy voice quality, phonatory instability, multiple or missing stop bursts, continuous voicing, and spirantization. The relationships between these findings and TBI speakers' neurological status and dysarthria types are also discussed. It was concluded that acoustic analyses of the AMR task provides specific information on motor speech limitations in individuals with TBI.
Objective speech quality assessment and the RPE-LTP coding algorithm in different noise and language conditions.

PubMed

Hansen, J H; Nandkumar, S

1995-01-01

The formulation of reliable signal processing algorithms for speech coding and synthesis require the selection of a prior criterion of performance. Though coding efficiency (bits/second) or computational requirements can be used, a final performance measure must always include speech quality. In this paper, three objective speech quality measures are considered with respect to quality assessment for American English, noisy American English, and noise-free versions of seven languages. The purpose is to determine whether objective quality measures can be used to quantify changes in quality for a given voice coding method, with a known subjective performance level, as background noise or language conditions are changed. The speech coding algorithm chosen is regular-pulse excitation with long-term prediction (RPE-LTP), which has been chosen as the standard voice compression algorithm for the European Digital Mobile Radio system. Three areas are considered for objective quality assessment which include: (i) vocoder performance for American English in a noise-free environment, (ii) speech quality variation for three additive background noise sources, and (iii) noise-free performance for seven languages which include English, Japanese, Finnish, German, Hindi, Spanish, and French. It is suggested that although existing objective quality measures will never replace subjective testing, they can be a useful means of assessing changes in performance, identifying areas for improvement in algorithm design, and augmenting subjective quality tests for voice coding/compression algorithms in noise-free, noisy, and/or non-English applications.
[The role of sex in voice restoration and emotional functioning after laryngectomy].

PubMed

Keszte, J; Wollbrück, D; Meyer, A; Fuchs, M; Meister, E; Pabst, F; Oeken, J; Schock, J; Wulke, C; Singer, S

2012-04-01

Data on psychosocial factors of laryngectomized women is rare. All means of alaryngeal voice production sound male due to low fundamental frequency and roughness, which makes postlaryngectomy voice rehabilitation especially challenging to women. Aim of this study was to investigate whether women use alaryngeal speech more seldomly and therefore are more emotionally distressed. In a cross-sectional multi-centred study 12 female and 138 male laryngectomees were interviewed. To identify risc factors on seldom use of alaryngeal speech and emotional functioning, logistic regression was used and odds ratios were adjusted to age, time since laryngectomy, physical functioning, social activity and feelings of stigmatization. Esophageal speech was used by 83% of the female and 57% of the male patients, prosthetic speech was used by 17% of the female and 20% of the male patients and electrolaryngeal speech was used by 17% of the female and 29% of the male patients. There was a higher risk for laryngectomees to be more emotionally distressed when feeling physically bad (OR=2,48; p=0,02) or having feelings of stigmatization (OR=3,94; p≤0,00). Besides more women tended to be socially active than men (83% vs. 54%; p=0,05). There was no influence of sex neither on use of alaryngeal speech nor on emotional functioning. Since there is evidence for a different psychosocial adjustment in laryngectomized men and women, more investigation including bigger sample sizes will be needed on this special issue. © Georg Thieme Verlag KG Stuttgart · New York.
Speech perception and communication ability over the telephone by Mandarin-speaking children with cochlear implants.

PubMed

Wu, Che-Ming; Liu, Tien-Chen; Wang, Nan-Mai; Chao, Wei-Chieh

2013-08-01

(1) To understand speech perception and communication ability through real telephone calls by Mandarin-speaking children with cochlear implants and compare them to live-voice perception, (2) to report the general condition of telephone use of this population, and (3) to investigate the factors that correlate with telephone speech perception performance. Fifty-six children with over 4 years of implant use (aged 6.8-13.6 years, mean duration 8.0 years) took three speech perception tests administered using telephone and live voice to examine sentence, monosyllabic-word and Mandarin tone perception. The children also filled out a questionnaire survey investigating everyday telephone use. Wilcoxon signed-rank test was used to compare the scores between live-voice and telephone tests, and Pearson's test to examine the correlation between them. The mean scores were 86.4%, 69.8% and 70.5% respectively for sentence, word and tone recognition over the telephone. The corresponding live-voice mean scores were 94.3%, 84.0% and 70.8%. Wilcoxon signed-rank test showed the sentence and word scores were significantly different between telephone and live voice test, while the tone recognition scores were not, indicating tone perception was less worsened by telephone transmission than words and sentences. Spearman's test showed that chronological age and duration of implant use were weakly correlated with the perception test scores. The questionnaire survey showed 78% of the children could initiate phone calls and 59% could use the telephone 2 years after implantation. Implanted children are potentially capable of using the telephone 2 years after implantation, and communication ability over the telephone becomes satisfactory 4 years after implantation. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Voice emotion recognition by cochlear-implanted children and their normally-hearing peers

PubMed Central

Chatterjee, Monita; Zion, Danielle; Deroche, Mickael L.; Burianek, Brooke; Limb, Charles; Goren, Alison; Kulkarni, Aditya M.; Christensen, Julie A.

2014-01-01

Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups’ mean performance is similar to aNHs’ performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. PMID:25448167
The objective vocal quality, vocal risk factors, vocal complaints, and corporal pain in Dutch female students training to be speech-language pathologists during the 4 years of study.

PubMed

Van Lierde, Kristiane M; D'haeseleer, Evelien; Wuyts, Floris L; De Ley, Sophia; Geldof, Ruben; De Vuyst, Julie; Sofie, Claeys

2010-09-01

The purpose of the present cross-sectional study was to determine the objective vocal quality and the vocal characteristics (vocal risk factors, vocal and corporal complaints) in 197 female students in speech-language pathology during the 4 years of study. The objective vocal quality was measured by means of the Dysphonia Severity Index (DSI). Perceptual voice assessment, the Voice Handicap Index (VHI), questionnaires addressing vocal risks, and vocal and corporal complaints during and/or after voice usage were performed. Speech-language pathology (SLP) students have a borderline vocal quality corresponding to a DSI% of 68. The analysis of variance revealed no significant change of the objective vocal quality between the first bachelor year and the master year. No psychosocial handicapping effect of the voice was observed by means of the VHI total, though there was an effect at the functional VHI level in addition to some vocal complaints. Ninety-three percent of the student SLPs reported the presence of corporal pain during and/or after speaking. In particular, sore throat and headache were mentioned as the prevalent corporal pain symptoms. A longitudinal study of the objective vocal quality of the same subjects during their career as an SLP might provide new insights. 2010 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Human voice perception.

PubMed

Latinus, Marianne; Belin, Pascal

2011-02-22

We are all voice experts. First and foremost, we can produce and understand speech, and this makes us a unique species. But in addition to speech perception, we routinely extract from voices a wealth of socially-relevant information in what constitutes a more primitive, and probably more universal, non-linguistic mode of communication. Consider the following example: you are sitting in a plane, and you can hear a conversation in a foreign language in the row behind you. You do not see the speakers' faces, and you cannot understand the speech content because you do not know the language. Yet, an amazing amount of information is available to you. You can evaluate the physical characteristics of the different protagonists, including their gender, approximate age and size, and associate an identity to the different voices. You can form a good idea of the different speaker's mood and affective state, as well as more subtle cues as the perceived attractiveness or dominance of the protagonists. In brief, you can form a fairly detailed picture of the type of social interaction unfolding, which a brief glance backwards can on the occasion help refine - sometimes surprisingly so. What are the acoustical cues that carry these different types of vocal information? How does our brain process and analyse this information? Here we briefly review an emerging field and the main tools used in voice perception research. Copyright © 2011 Elsevier Ltd. All rights reserved.
Tracheoesophageal Prosthesis Use Is Associated With Improved Overall Quality of Life in Veterans With Laryngeal Cancer.

PubMed

Patel, Ramya S; Mohr, Tiffany; Hartman, Christine; Stach, Carol; Sikora, Andrew G; Zevallos, Jose P; Sandulache, Vlad C

2018-05-01

Veterans have an increased risk of laryngeal cancer, yet their oncologic and functional outcomes remain understudied. We sought to determine the longitudinal impact of tracheoesophageal puncture and voice prosthesis on quality-of-life measures in veterans following total laryngectomy (TL). We performed a cross-sectional analysis of TL patients (n = 68) treated at the Michael E. DeBakey Veterans Affairs Medical Center using the Voice Handicap Index (VHI), MD Anderson Dysphagia Index (MDADI), and University of Washington Quality of Life Index (UW-QOL). Using tracheoesophageal (TE) speech was associated with significantly better VHI, MDADI, and UW-QOL scores compared to other forms of communication. The association between TE speech use on VHI, MDADI, and UQ-QOL persisted even when the analysis was limited to patients with >5-year follow-up and was maintained on multivariate analysis that accounted for a history of radiation and laryngectomy for recurrent laryngeal cancer. Using tracheoesophageal speech after total laryngectomy is associated with durable improvements in quality of life and functional outcomes in veterans. Tracheoesophageal voice restoration should be attempted whenever technically feasible in patients that meet the complex psychosocial and physical requirements to appropriately utilize TE speech.
[Nature of speech disorders in Parkinson disease].

PubMed

Pawlukowska, W; Honczarenko, K; Gołąb-Janowska, M

2013-01-01

The aim of the study was to discuss physiology and pathology of speech and review of the literature on speech disorders in Parkinson disease. Additionally, the most effective methods to diagnose the speech disorders in Parkinson disease were also stressed. Afterward, articulatory, respiratory, acoustic and pragmatic factors contributing to the exacerbation of the speech disorders were discussed. Furthermore, the study dealt with the most important types of speech treatment techniques available (pharmacological and behavioral) and a significance of Lee Silverman Voice Treatment was highlighted.
The Multiple Source Effect and Synthesized Speech: Doubly-Disembodied Language as a Conceptual Framework

ERIC Educational Resources Information Center

Lee, Kwan Min; Nass, Clifford

2004-01-01

Two experiments examine the effect of multiple synthetic voices in an e-commerce context. In Study 1, participants (N=40) heard five positive reviews about a book from five different synthetic voices or from a single synthetic voice. Consistent with the multiple source effect, results showed that participants hearing multiple synthetic voices…
Influence of Left-Right Asymmetries on Voice Quality in Simulated Paramedian Vocal Fold Paralysis

ERIC Educational Resources Information Center

Samlan, Robin A.; Story, Brad H.

2017-01-01

Purpose: The purpose of this study was to determine the vocal fold structural and vibratory symmetries that are important to vocal function and voice quality in a simulated paramedian vocal fold paralysis. Method: A computational kinematic speech production model was used to simulate an exemplar "voice" on the basis of asymmetric…

The research questions and methodological adequacy of clinical studies of the voice and larynx published in Brazilian and international journals.

PubMed

Vieira, Vanessa Pedrosa; De Biase, Noemi; Peccin, Maria Stella; Atallah, Alvaro Nagib

2009-06-01

To evaluate the methodological adequacy of voice and laryngeal study designs published in speech-language pathology and otorhinolaryngology journals indexed for the ISI Web of Knowledge (ISI Web) and the MEDLINE database. A cross-sectional study conducted at the Universidade Federal de São Paulo (Federal University of São Paulo). Two Brazilian speech-language pathology and otorhinolaryngology journals (Pró-Fono and Revista Brasileira de Otorrinolaringologia) and two international speech-language pathology and otorhinolaryngology journals (Journal of Voice, Laryngoscope), all dated between 2000 and 2004, were hand-searched by specialists. Subsequently, voice and larynx publications were separated, and a speech-language pathologist and otorhinolaryngologist classified 374 articles from the four journals according to objective and study design. The predominant objective contained in the articles was that of primary diagnostic evaluation (27%), and the most frequent study design was case series (33.7%). A mere 7.8% of the studies were designed adequately with respect to the stated objectives. There was no statistical difference in the methodological quality of studies indexed for the ISI Web and the MEDLINE database. The studies published in both national journals, indexed for the MEDLINE database, and international journals, indexed for the ISI Web, demonstrate weak methodology, with research poorly designed to meet the proposed objectives. There is much scientific work to be done in order to decrease uncertainty in the field analysed.
Decoding Articulatory Features from fMRI Responses in Dorsal Speech Regions.

PubMed

Correia, Joao M; Jansma, Bernadette M B; Bonte, Milene

2015-11-11

The brain's circuitry for perceiving and producing speech may show a notable level of overlap that is crucial for normal development and behavior. The extent to which sensorimotor integration plays a role in speech perception remains highly controversial, however. Methodological constraints related to experimental designs and analysis methods have so far prevented the disentanglement of neural responses to acoustic versus articulatory speech features. Using a passive listening paradigm and multivariate decoding of single-trial fMRI responses to spoken syllables, we investigated brain-based generalization of articulatory features (place and manner of articulation, and voicing) beyond their acoustic (surface) form in adult human listeners. For example, we trained a classifier to discriminate place of articulation within stop syllables (e.g., /pa/ vs /ta/) and tested whether this training generalizes to fricatives (e.g., /fa/ vs /sa/). This novel approach revealed generalization of place and manner of articulation at multiple cortical levels within the dorsal auditory pathway, including auditory, sensorimotor, motor, and somatosensory regions, suggesting the representation of sensorimotor information. Additionally, generalization of voicing included the right anterior superior temporal sulcus associated with the perception of human voices as well as somatosensory regions bilaterally. Our findings highlight the close connection between brain systems for speech perception and production, and in particular, indicate the availability of articulatory codes during passive speech perception. Sensorimotor integration is central to verbal communication and provides a link between auditory signals of speech perception and motor programs of speech production. It remains highly controversial, however, to what extent the brain's speech perception system actively uses articulatory (motor), in addition to acoustic/phonetic, representations. In this study, we examine the role of articulatory representations during passive listening using carefully controlled stimuli (spoken syllables) in combination with multivariate fMRI decoding. Our approach enabled us to disentangle brain responses to acoustic and articulatory speech properties. In particular, it revealed articulatory-specific brain responses of speech at multiple cortical levels, including auditory, sensorimotor, and motor regions, suggesting the representation of sensorimotor information during passive speech perception. Copyright © 2015 the authors 0270-6474/15/3515015-11$15.00/0.
Current trends in small vocabulary speech recognition for equipment control

NASA Astrophysics Data System (ADS)

Doukas, Nikolaos; Bardis, Nikolaos G.

2017-09-01

Speech recognition systems allow human - machine communication to acquire an intuitive nature that approaches the simplicity of inter - human communication. Small vocabulary speech recognition is a subset of the overall speech recognition problem, where only a small number of words need to be recognized. Speaker independent small vocabulary recognition can find significant applications in field equipment used by military personnel. Such equipment may typically be controlled by a small number of commands that need to be given quickly and accurately, under conditions where delicate manual operations are difficult to achieve. This type of application could hence significantly benefit by the use of robust voice operated control components, as they would facilitate the interaction with their users and render it much more reliable in times of crisis. This paper presents current challenges involved in attaining efficient and robust small vocabulary speech recognition. These challenges concern feature selection, classification techniques, speaker diversity and noise effects. A state machine approach is presented that facilitates the voice guidance of different equipment in a variety of situations.
[Swallowing and Voice Disorders in Cancer Patients].

PubMed

Tanuma, Akira

2015-07-01

Dysphagia sometimes occurs in patients with head and neck cancer, particularly in those undergoing surgery and radiotherapy for lingual, pharyngeal, and laryngeal cancer. It also occurs in patients with esophageal cancer and brain tumor. Patients who undergo glossectomy usually show impairment of the oral phase of swallowing, whereas those with pharyngeal, laryngeal, and esophageal cancer show impairment of the pharyngeal phase of swallowing. Videofluoroscopic examination of swallowing provides important information necessary for rehabilitation of swallowing in these patients. Appropriate swallowing exercises and compensatory strategies can be decided based on the findings of the evaluation. Palatal augmentation prostheses are sometimes used for rehabilitation in patients undergoing glossectomy. Patients who undergo total laryngectomy or total pharyngolaryngoesophagectomy should receive speech therapy to enable them to use alaryngeal speech methods, including electrolarynx, esophageal speech, or speech via tracheoesophageal puncture. Regaining swallowing function and speech can improve a patient's emotional health and quality of life. Therefore, it is important to manage swallowing and voice disorders appropriately.
Co-Variation of Tonality in the Music and Speech of Different Cultures

PubMed Central

Han, Shui' er; Sundararajan, Janani; Bowling, Daniel Liu; Lake, Jessica; Purves, Dale

2011-01-01

Whereas the use of discrete pitch intervals is characteristic of most musical traditions, the size of the intervals and the way in which they are used is culturally specific. Here we examine the hypothesis that these differences arise because of a link between the tonal characteristics of a culture's music and its speech. We tested this idea by comparing pitch intervals in the traditional music of three tone language cultures (Chinese, Thai and Vietnamese) and three non-tone language cultures (American, French and German) with pitch intervals between voiced speech segments. Changes in pitch direction occur more frequently and pitch intervals are larger in the music of tone compared to non-tone language cultures. More frequent changes in pitch direction and larger pitch intervals are also apparent in the speech of tone compared to non-tone language cultures. These observations suggest that the different tonal preferences apparent in music across cultures are closely related to the differences in the tonal characteristics of voiced speech. PMID:21637716
Using visible speech to train perception and production of speech for individuals with hearing loss.

PubMed

Massaro, Dominic W; Light, Joanna

2004-04-01

The main goal of this study was to implement a computer-animated talking head, Baldi, as a language tutor for speech perception and production for individuals with hearing loss. Baldi can speak slowly; illustrate articulation by making the skin transparent to reveal the tongue, teeth, and palate; and show supplementary articulatory features, such as vibration of the neck to show voicing and turbulent airflow to show frication. Seven students with hearing loss between the ages of 8 and 13 were trained for 6 hours across 21 weeks on 8 categories of segments (4 voiced vs. voiceless distinctions, 3 consonant cluster distinctions, and 1 fricative vs. affricate distinction). Training included practice at the segment and the word level. Perception and production improved for each of the 7 children. Speech production also generalized to new words not included in the training lessons. Finally, speech production deteriorated somewhat after 6 weeks without training, indicating that the training method rather than some other experience was responsible for the improvement that was found.
Status Report on Speech Research, No. 29/30, January-June 1972.

ERIC Educational Resources Information Center

Haskins Labs., New Haven, CT.

This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts and extended reports cover the following topics: iconic storage, voice-timing perception, oral anesthesia, laryngeal function, electromyography of speech production,…
Speech Spectrum's Correlation with Speakers' Eysenck Personality Traits

PubMed Central

Hu, Chao; Wang, Qiandong; Short, Lindsey A.; Fu, Genyue

2012-01-01

The current study explored the correlation between speakers' Eysenck personality traits and speech spectrum parameters. Forty-six subjects completed the Eysenck Personality Questionnaire. They were instructed to verbally answer the questions shown on a computer screen and their responses were recorded by the computer. Spectrum parameters of /sh/ and /i/ were analyzed by Praat voice software. Formant frequencies of the consonant /sh/ in lying responses were significantly lower than that in truthful responses, whereas no difference existed on the vowel /i/ speech spectrum. The second formant bandwidth of the consonant /sh/ speech spectrum was significantly correlated with the personality traits of Psychoticism, Extraversion, and Neuroticism, and the correlation differed between truthful and lying responses, whereas the first formant frequency of the vowel /i/ speech spectrum was negatively correlated with Neuroticism in both response types. The results suggest that personality characteristics may be conveyed through the human voice, although the extent to which these effects are due to physiological differences in the organs associated with speech or to a general Pygmalion effect is yet unknown. PMID:22439014
Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

NASA Technical Reports Server (NTRS)

Ye, Sherry

2015-01-01

NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.
Underconnectivity between voice-selective cortex and reward circuitry in children with autism.

PubMed

Abrams, Daniel A; Lynch, Charles J; Cheng, Katherine M; Phillips, Jennifer; Supekar, Kaustubh; Ryali, Srikanth; Uddin, Lucina Q; Menon, Vinod

2013-07-16

Individuals with autism spectrum disorders (ASDs) often show insensitivity to the human voice, a deficit that is thought to play a key role in communication deficits in this population. The social motivation theory of ASD predicts that impaired function of reward and emotional systems impedes children with ASD from actively engaging with speech. Here we explore this theory by investigating distributed brain systems underlying human voice perception in children with ASD. Using resting-state functional MRI data acquired from 20 children with ASD and 19 age- and intelligence quotient-matched typically developing children, we examined intrinsic functional connectivity of voice-selective bilateral posterior superior temporal sulcus (pSTS). Children with ASD showed a striking pattern of underconnectivity between left-hemisphere pSTS and distributed nodes of the dopaminergic reward pathway, including bilateral ventral tegmental areas and nucleus accumbens, left-hemisphere insula, orbitofrontal cortex, and ventromedial prefrontal cortex. Children with ASD also showed underconnectivity between right-hemisphere pSTS, a region known for processing speech prosody, and the orbitofrontal cortex and amygdala, brain regions critical for emotion-related associative learning. The degree of underconnectivity between voice-selective cortex and reward pathways predicted symptom severity for communication deficits in children with ASD. Our results suggest that weak connectivity of voice-selective cortex and brain structures involved in reward and emotion may impair the ability of children with ASD to experience speech as a pleasurable stimulus, thereby impacting language and social skill development in this population. Our study provides support for the social motivation theory of ASD.
Autism Center First to Study Minimally Verbal Children

MedlinePlus

... on. Feature: Taste, Smell, Hearing, Language, Voice, Balance Autism Center First to Study Minimally Verbal Children Past ... research exploring the causes, diagnosis, and treatment of autism spectrum disorder (ASD), a complex developmental disorder that ...
Acoustical conditions for speech communication in active elementary school classrooms

NASA Astrophysics Data System (ADS)

Sato, Hiroshi; Bradley, John

2005-04-01

Detailed acoustical measurements were made in 34 active elementary school classrooms with typical rectangular room shape in schools near Ottawa, Canada. There was an average of 21 students in classrooms. The measurements were made to obtain accurate indications of the acoustical quality of conditions for speech communication during actual teaching activities. Mean speech and noise levels were determined from the distribution of recorded sound levels and the average speech-to-noise ratio was 11 dBA. Measured mid-frequency reverberation times (RT) during the same occupied conditions varied from 0.3 to 0.6 s, and were a little less than for the unoccupied rooms. RT values were not related to noise levels. Octave band speech and noise levels, useful-to-detrimental ratios, and Speech Transmission Index values were also determined. Key results included: (1) The average vocal effort of teachers corresponded to louder than Pearsons Raised voice level; (2) teachers increase their voice level to overcome ambient noise; (3) effective speech levels can be enhanced by up to 5 dB by early reflection energy; and (4) student activity is seen to be the dominant noise source, increasing average noise levels by up to 10 dBA during teaching activities. [Work supported by CLLRnet.
Functional overlap between regions involved in speech perception and in monitoring one's own voice during speech production.

PubMed

Zheng, Zane Z; Munhall, Kevin G; Johnsrude, Ingrid S

2010-08-01

The fluency and the reliability of speech production suggest a mechanism that links motor commands and sensory feedback. Here, we examined the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not and by examining the overlap with the network recruited during passive listening to speech sounds. We used real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word ("Ted") and either heard this clearly or heard voice-gated masking noise. We compared this to when they listened to yoked stimuli (identical recordings of "Ted" or noise) without speaking. Activity along the STS and superior temporal gyrus bilaterally was significantly greater if the auditory stimulus was (a) processed as the auditory concomitant of speaking and (b) did not match the predicted outcome (noise). The network exhibiting this Feedback Type x Production/Perception interaction includes a superior temporal gyrus/middle temporal gyrus region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts and that processes an error signal in speech-sensitive regions when this and the sensory data do not match.
Functional overlap between regions involved in speech perception and in monitoring one’s own voice during speech production

PubMed Central

Zheng, Zane Z.; Munhall, Kevin G; Johnsrude, Ingrid S

2009-01-01

The fluency and reliability of speech production suggests a mechanism that links motor commands and sensory feedback. Here, we examine the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not, and examining the overlap with the network recruited during passive listening to speech sounds. We use real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word (‘Ted’) and either heard this clearly, or heard voice-gated masking noise. We compare this to when they listened to yoked stimuli (identical recordings of ‘Ted’ or noise) without speaking. Activity along the superior temporal sulcus (STS) and superior temporal gyrus (STG) bilaterally was significantly greater if the auditory stimulus was a) processed as the auditory concomitant of speaking and b) did not match the predicted outcome (noise). The network exhibiting this Feedback type by Production/Perception interaction includes an STG/MTG region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts, and that processes an error signal in speech-sensitive regions when this and the sensory data do not match. PMID:19642886
Rating, ranking, and understanding acoustical quality in university classrooms

NASA Astrophysics Data System (ADS)

Hodgson, Murray

2002-08-01

Nonoptimal classroom acoustical conditions directly affect speech perception and, thus, learning by students. Moreover, they may lead to voice problems for the instructor, who is forced to raise his/her voice when lecturing to compensate for poor acoustical conditions. The project applied previously developed simplified methods to predict speech intelligibility in occupied classrooms from measurements in unoccupied and occupied university classrooms. The methods were used to predict the speech intelligibility at various positions in 279 University of British Columbia (UBC) classrooms, when 70% occupied, and for four instructor voice levels. Classrooms were classified and rank ordered by acoustical quality, as determined by the room-average speech intelligibility. This information was used by UBC to prioritize classrooms for renovation. Here, the statistical results are reported to illustrate the range of acoustical qualities found at a typical university. Moreover, the variations of quality with relevant classroom acoustical parameters were studied to better understand the results. In particular, the factors leading to the best and worst conditions were studied. It was found that 81% of the 279 classrooms have "good," "very good," or "excellent" acoustical quality with a "typical" (average-male) instructor. However, 50 (18%) of the classrooms had "fair" or "poor" quality, and two had "bad" quality, due to high ventilation-noise levels. Most rooms were "very good" or "excellent" at the front, and "good" or "very good" at the back. Speech quality varied strongly with the instructor voice level. In the worst case considered, with a quiet female instructor, most of the classrooms were "bad" or "poor." Quality also varies with occupancy, with decreased occupancy resulting in decreased quality. The research showed that a new classroom acoustical design and renovation should focus on limiting background noise. They should promote high instructor speech levels at the back of the classrooms. This involves, in part, limiting the amount of sound absorption that is introduced into classrooms to control reverberation. Speech quality is not very sensitive to changes in reverberation, so controlling it for its own sake should not be a design priority. copyright 2002 Acoustical Society of America.
Autonomic Nervous System Responses During Perception of Masked Speech may Reflect Constructs other than Subjective Listening Effort

PubMed Central

Francis, Alexander L.; MacPherson, Megan K.; Chandrasekaran, Bharath; Alvar, Ann M.

2016-01-01

Typically, understanding speech seems effortless and automatic. However, a variety of factors may, independently or interactively, make listening more effortful. Physiological measures may help to distinguish between the application of different cognitive mechanisms whose operation is perceived as effortful. In the present study, physiological and behavioral measures associated with task demand were collected along with behavioral measures of performance while participants listened to and repeated sentences. The goal was to measure psychophysiological reactivity associated with three degraded listening conditions, each of which differed in terms of the source of the difficulty (distortion, energetic masking, and informational masking), and therefore were expected to engage different cognitive mechanisms. These conditions were chosen to be matched for overall performance (keywords correct), and were compared to listening to unmasked speech produced by a natural voice. The three degraded conditions were: (1) Unmasked speech produced by a computer speech synthesizer, (2) Speech produced by a natural voice and masked byspeech-shaped noise and (3) Speech produced by a natural voice and masked by two-talker babble. Masked conditions were both presented at a -8 dB signal to noise ratio (SNR), a level shown in previous research to result in comparable levels of performance for these stimuli and maskers. Performance was measured in terms of proportion of key words identified correctly, and task demand or effort was quantified subjectively by self-report. Measures of psychophysiological reactivity included electrodermal (skin conductance) response frequency and amplitude, blood pulse amplitude and pulse rate. Results suggest that the two masked conditions evoked stronger psychophysiological reactivity than did the two unmasked conditions even when behavioral measures of listening performance and listeners’ subjective perception of task demand were comparable across the three degraded conditions. PMID:26973564
Perception and analysis of Spanish accents in English speech

NASA Astrophysics Data System (ADS)

Chism, Cori; Lass, Norman

2002-05-01

The purpose of the present study was to determine what relates most closely to the degree of perceived foreign accent in the English speech of native Spanish speakers: intonation, vowel length, stress, voice onset time (VOT), or segmental accuracy. Nineteen native English speaking listeners rated speech samples from 7 native English speakers and 15 native Spanish speakers for comprehensibility and degree of foreign accent. The speech samples were analyzed spectrographically and perceptually to obtain numerical values for each variable. Correlation coefficients were computed to determine the relationship beween these values and the average foreign accent scores. Results showed that the average foreign accent scores were statistically significantly correlated with three variables: the length of stressed vowels (r=-0.48, p=0.05), voice onset time (r =-0.62, p=0.01), and segmental accuracy (r=0.92, p=0.001). Implications of these findings and suggestions for future research are discussed.
Assessment of Severe Apnoea through Voice Analysis, Automatic Speech, and Speaker Recognition Techniques

NASA Astrophysics Data System (ADS)

Fernández Pozo, Rubén; Blanco Murillo, Jose Luis; Hernández Gómez, Luis; López Gonzalo, Eduardo; Alcázar Ramírez, José; Toledano, Doroteo T.

2009-12-01

This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.
Research on oral test modeling based on multi-feature fusion

NASA Astrophysics Data System (ADS)

Shi, Yuliang; Tao, Yiyue; Lei, Jun

2018-04-01

In this paper, the spectrum of speech signal is taken as an input of feature extraction. The advantage of PCNN in image segmentation and other processing is used to process the speech spectrum and extract features. And a new method combining speech signal processing and image processing is explored. At the same time of using the features of the speech map, adding the MFCC to establish the spectral features and integrating them with the features of the spectrogram to further improve the accuracy of the spoken language recognition. Considering that the input features are more complicated and distinguishable, we use Support Vector Machine (SVM) to construct the classifier, and then compare the extracted test voice features with the standard voice features to achieve the spoken standard detection. Experiments show that the method of extracting features from spectrograms using PCNN is feasible, and the fusion of image features and spectral features can improve the detection accuracy.
Alerting prefixes for speech warning messages. [in helicopters

NASA Technical Reports Server (NTRS)

Bucher, N. M.; Voorhees, J. W.; Karl, R. L.; Werner, E.

1984-01-01

A major question posed by the design of an integrated voice information display/warning system for next-generation helicopter cockpits is whether an alerting prefix should precede voice warning messages; if so, the characteristics desirable in such a cue must also be addressed. Attention is presently given to the results of a study which ascertained pilot response time and response accuracy to messages preceded by either neutral cues or the cognitively appropriate semantic cues. Both verbal cues and messages were spoken in direct, phoneme-synthesized speech, and a training manipulation was included to determine the extent to which previous exposure to speech thus produced facilitates these messages' comprehension. Results are discussed in terms of the importance of human factors research in cockpit display design.

Feasibility of event-related potential (ERP) biomarker use to study effects of mother's voice exposure on speech sound differentiation of preterm infants.

PubMed

D Chorna, Olena; L Hamm, Ellyn; Shrivastava, Hemang; Maitre, Nathalie L

2018-01-01

Atypical maturation of auditory neural processing contributes to preterm-born infants' language delays. Event-related potential (ERP) measurement of speech-sound differentiation might fill a gap in treatment-response biomarkers to auditory interventions. We evaluated whether these markers could measure treatment effects in a quasi-randomized prospective study. Hospitalized preterm infants in passive or active, suck-contingent mother's voice exposure groups were not different at baseline. Post-intervention, the active group had greater increases in/du/-/gu/differentiation in left frontal and temporal regions. Infants with brain injury had lower baseline/ba/-/ga/and/du/-/gu/differentiation than those without. ERP provides valid discriminative, responsive, and predictive biomarkers of infant speech-sound differentiation.
Voice Onset Time for Female Trained and Untrained Singers during Speech and Singing

ERIC Educational Resources Information Center

McCrea, Christopher R.; Morris, Richard J.

2007-01-01

The purpose of this study was to examine the voice onset times of female trained and untrained singers during spoken and sung tasks. Thirty females were digitally recorded speaking and singing short phrases containing the English stop consonants /p/ and /b/ in the word-initial position. Voice onset time was measured for each phoneme and…
STS-41 Voice Command System Flight Experiment Report

NASA Technical Reports Server (NTRS)

Salazar, George A.

1981-01-01

This report presents the results of the Voice Command System (VCS) flight experiment on the five-day STS-41 mission. Two mission specialists,Bill Shepherd and Bruce Melnick, used the speaker-dependent system to evaluate the operational effectiveness of using voice to control a spacecraft system. In addition, data was gathered to analyze the effects of microgravity on speech recognition performance.
Perception of a non-native speech contrast: Voiced and voiceless stops as perceived by Tamil speakers

NASA Astrophysics Data System (ADS)

Tur, Sylwia

2004-05-01

The effect of linguistic experience plays a significant role in how speech sounds are perceived. The findings of many studies imply that the perception of non-native contrasts depends on their status in the native language of the listener. Tamil is a language with a single voicing category. All stop consonants in Tamil are phonemically voiceless, though allophonic voicing has been observed in spoken Tamil. The present study examined how native Tamil speakers and English controls perceived voiced and voiceless bilabial, alveolar, and velar stops in English. Voice onset time (VOT) was manipulated for editing of naturally produced stimuli with increasingly longer continuum. Perceptual data was collected from 16 Tamil and 16 English speakers. Experiment 1 was an AX task in which subjects responded same or different to 162 pairs of stimuli. Experiment 2 was a forced choice ID task in which subjects identified 99 individually presented stimuli as pa, ta, ka or ba, da, ga. Experiments show statistically significant differences between Tamil and English speakers in their perception of English stop consonants. Results of the study imply that the allophonic status of voiced stops in Tamil does not aid the Tamil speakers in perceiving phonemically voiced stops in English.
Effects of Voice Rehabilitation After Radiation Therapy for Laryngeal Cancer: A Randomized Controlled Study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tuomi, Lisa, E-mail: lisa.tuomi@vgregion.se; Andréll, Paulin; Finizia, Caterina

Background: Patients treated with radiation therapy for laryngeal cancer often experience voice problems. The aim of this randomized controlled trial was to assess the efficacy of voice rehabilitation for laryngeal cancer patients after having undergone radiation therapy and to investigate whether differences between different tumor localizations with regard to rehabilitation outcomes exist. Methods and Materials: Sixty-nine male patients irradiated for laryngeal cancer participated. Voice recordings and self-assessments of communicative dysfunction were performed 1 and 6 months after radiation therapy. Thirty-three patients were randomized to structured voice rehabilitation with a speech-language pathologist and 36 to a control group. Furthermore, comparisons withmore » 23 healthy control individuals were made. Acoustic analyses were performed for all patients, including the healthy control individuals. The Swedish version of the Self Evaluation of Communication Experiences after Laryngeal Cancer and self-ratings of voice function were used to assess vocal and communicative function. Results: The patients who received vocal rehabilitation experienced improved self-rated vocal function after rehabilitation. Patients with supraglottic tumors who received voice rehabilitation had statistically significant improvements in voice quality and self-rated vocal function, whereas the control group did not. Conclusion: Voice rehabilitation for male patients with laryngeal cancer is efficacious regarding patient-reported outcome measurements. The patients experienced better voice function after rehabilitation. Patients with supraglottic tumors also showed an improvement in terms of acoustic voice outcomes. Rehabilitation with a speech-language pathologist is recommended for laryngeal cancer patients after radiation therapy, particularly for patients with supraglottic tumors.« less
Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

PubMed

Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

2015-07-01

It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line with the 'auditory-visual view' of auditory speech perception, which assumes that auditory speech recognition is optimized by using predictions from previously encoded speaker-specific audio-visual internal models. Copyright © 2015 Elsevier Ltd. All rights reserved.
Speech-Language Pathology production regarding voice in popular singing.

PubMed

Drumond, Lorena Badaró; Vieira, Naymme Barbosa; Oliveira, Domingos Sávio Ferreira de

2011-12-01

To present a literature review about the Brazilian scientific production in Speech-Language Pathology and Audiology regarding voice in popular singing in the last decade, as for number of publications, musical styles studied, focus of the researches, and instruments used for data collection. Cross-sectional descriptive study carried out in two stages: search in databases and publications encompassing the last decade of researches in this area in Brazil, and reading of the material obtained for posterior categorization. The databases LILACS and SciELO, the Databasis of Dissertations and Theses organized by CAPES, the online version of Acta ORL, and the online version of OPUS were searched, using the following uniterms: voice, professional voice, singing voice, dysphonia, voice disorders, voice training, music, dysodia. Articles published between the years 2000 and 2010 were selected. The researches found were classified and categorized after reading their abstracts and, when necessary, the whole study. Twenty researches within the proposed theme were selected, all of which were descriptive, involving several musical styles. Twelve studies focused on the evaluation of the popular singer's voice, and the most frequently used data collection instrument was the auditory-perceptual evaluation. The results of the publications found corroborate the objectives proposed by the authors and the different methodologies. The number of studies published is still restricted when compared to the diversity of musical genres and the uniqueness of popular singer.
Performance of wavelet analysis and neural networks for pathological voices identification

NASA Astrophysics Data System (ADS)

Salhi, Lotfi; Talbi, Mourad; Abid, Sabeur; Cherif, Adnane

2011-09-01

Within the medical environment, diverse techniques exist to assess the state of the voice of the patient. The inspection technique is inconvenient for a number of reasons, such as its high cost, the duration of the inspection, and above all, the fact that it is an invasive technique. This study focuses on a robust, rapid and accurate system for automatic identification of pathological voices. This system employs non-invasive, non-expensive and fully automated method based on hybrid approach: wavelet transform analysis and neural network classifier. First, we present the results obtained in our previous study while using classic feature parameters. These results allow visual identification of pathological voices. Second, quantified parameters drifting from the wavelet analysis are proposed to characterise the speech sample. On the other hand, a system of multilayer neural networks (MNNs) has been developed which carries out the automatic detection of pathological voices. The developed method was evaluated using voice database composed of recorded voice samples (continuous speech) from normophonic or dysphonic speakers. The dysphonic speakers were patients of a National Hospital 'RABTA' of Tunis Tunisia and a University Hospital in Brussels, Belgium. Experimental results indicate a success rate ranging between 75% and 98.61% for discrimination of normal and pathological voices using the proposed parameters and neural network classifier. We also compared the average classification rate based on the MNN, Gaussian mixture model and support vector machines.
NWR (National Weather Service) voice synthesis project, phase 1

NASA Astrophysics Data System (ADS)

Sampson, G. W.

1986-01-01

The purpose of the NOAA Weather Radio (NWR) Voice Synthesis Project is to provide a demonstration of the current voice synthesis technology. Phase 1 of this project is presented, providing a complete automation of an hourly surface aviation observation for broadcast over NWR. In examining the products currently available on the market, the decision was made that synthetic voice technology does not have the high quality speech required for broadcast over the NWR. Therefore the system presented uses the phrase concatenation type of technology for a very high quality, versatile, voice synthesis system.
Detecting "Infant-Directedness" in Face and Voice

ERIC Educational Resources Information Center

Kim, Hojin I.; Johnson, Scott P.

2014-01-01

Five- and 3-month-old infants' perception of infant-directed (ID) faces and the role of speech in perceiving faces were examined. Infants' eye movements were recorded as they viewed a series of two side-by-side talking faces, one infant-directed and one adult-directed (AD), while listening to ID speech, AD speech, or in silence. Infants…
Audiovisual Speech Perception and Eye Gaze Behavior of Adults with Asperger Syndrome

ERIC Educational Resources Information Center

Saalasti, Satu; Katsyri, Jari; Tiippana, Kaisa; Laine-Hernandez, Mari; von Wendt, Lennart; Sams, Mikko

2012-01-01

Audiovisual speech perception was studied in adults with Asperger syndrome (AS), by utilizing the McGurk effect, in which conflicting visual articulation alters the perception of heard speech. The AS group perceived the audiovisual stimuli differently from age, sex and IQ matched controls. When a voice saying /p/ was presented with a face…
Speech & Language Impairments. NICHCY Disability Fact Sheet #11

ERIC Educational Resources Information Center

National Dissemination Center for Children with Disabilities, 2011

2011-01-01

There are many kinds of speech and language disorders that can affect children. This fact sheet will present four major areas in which these impairments occur. These are the areas of: (1) Articulation; (2) Fluency; (3) Voice; and (4) Language. Following a brief narrative on a day in the life of a Speech Language Pathologist, this fact sheet…
Acoustic Changes in the Speech of Children with Cerebral Palsy Following an Intensive Program of Dysarthria Therapy

ERIC Educational Resources Information Center

Pennington, Lindsay; Lombardo, Eftychia; Steen, Nick; Miller, Nick

2018-01-01

Background: The speech intelligibility of children with dysarthria and cerebral palsy has been observed to increase following therapy focusing on respiration and phonation. Aims: To determine if speech intelligibility change following intervention is associated with change in acoustic measures of voice. Methods & Procedures: We recorded 16…
Predicting Fatigue and Psychophysiological Test Performance from Speech for Safety-Critical Environments.

PubMed

Baykaner, Khan Richard; Huckvale, Mark; Whiteley, Iya; Andreeva, Svetlana; Ryumin, Oleg

2015-01-01

Automatic systems for estimating operator fatigue have application in safety-critical environments. A system which could estimate level of fatigue from speech would have application in domains where operators engage in regular verbal communication as part of their duties. Previous studies on the prediction of fatigue from speech have been limited because of their reliance on subjective ratings and because they lack comparison to other methods for assessing fatigue. In this paper, we present an analysis of voice recordings and psychophysiological test scores collected from seven aerospace personnel during a training task in which they remained awake for 60 h. We show that voice features and test scores are affected by both the total time spent awake and the time position within each subject's circadian cycle. However, we show that time spent awake and time-of-day information are poor predictors of the test results, while voice features can give good predictions of the psychophysiological test scores and sleep latency. Mean absolute errors of prediction are possible within about 17.5% for sleep latency and 5-12% for test scores. We discuss the implications for the use of voice as a means to monitor the effects of fatigue on cognitive performance in practical applications.
Kernel-Based Sensor Fusion With Application to Audio-Visual Voice Activity Detection

NASA Astrophysics Data System (ADS)

Dov, David; Talmon, Ronen; Cohen, Israel

2016-12-01

In this paper, we address the problem of multiple view data fusion in the presence of noise and interferences. Recent studies have approached this problem using kernel methods, by relying particularly on a product of kernels constructed separately for each view. From a graph theory point of view, we analyze this fusion approach in a discrete setting. More specifically, based on a statistical model for the connectivity between data points, we propose an algorithm for the selection of the kernel bandwidth, a parameter, which, as we show, has important implications on the robustness of this fusion approach to interferences. Then, we consider the fusion of audio-visual speech signals measured by a single microphone and by a video camera pointed to the face of the speaker. Specifically, we address the task of voice activity detection, i.e., the detection of speech and non-speech segments, in the presence of structured interferences such as keyboard taps and office noise. We propose an algorithm for voice activity detection based on the audio-visual signal. Simulation results show that the proposed algorithm outperforms competing fusion and voice activity detection approaches. In addition, we demonstrate that a proper selection of the kernel bandwidth indeed leads to improved performance.
Aspects of oral communication in patients with Parkinson's disease submitted to Deep Brain Stimulation.

PubMed

Cruz, Aline Nunes da; Beber, Bárbara Costa; Olchik, Maira Rozenfeld; Chaves, Márcia Lorena Fagundes; Rieder, Carlos Roberto de Mello; Dornelles, Sílvia

2016-01-01

Deep Brain Stimulation (DBS) has been satisfactorily used to control the cardinal motor symptoms of Parkinson's disease (PD), but little is known about its impact on communication. This study aimed to characterize the aspects of cognition, language, speech, voice, and self-perception in two patients with PD, pre- and post- DBS implant surgery. The patients were assessed using a cognitive screening test, a brief language evaluation, a self-declared protocol, and an analysis of the aspects of voice and speech, which was conducted by a specialized Speech-language Therapist who was blinded for the study. At the pre-surgery assessment, Case I showed impairment regarding the aspects of cognition, language and voice, whereas Case II showed impairment only with respect to the voice aspect. The post-surgery evaluation of the cases showed an opposite pattern of the effect of DBS after analysis of the communication data: Case I, who presented greater impairment before the surgery, showed improvement in some aspects; Case II, who presented lower communicative impairment before the surgery, showed worsening in other aspects. This study shows that DBS may influence different communication aspects both positively and negatively. Factors associated with the different effects caused by DBS on the communication of patients with PD need to be further investigated.
Predicting Fatigue and Psychophysiological Test Performance from Speech for Safety-Critical Environments

PubMed Central

Baykaner, Khan Richard; Huckvale, Mark; Whiteley, Iya; Andreeva, Svetlana; Ryumin, Oleg

2015-01-01

Automatic systems for estimating operator fatigue have application in safety-critical environments. A system which could estimate level of fatigue from speech would have application in domains where operators engage in regular verbal communication as part of their duties. Previous studies on the prediction of fatigue from speech have been limited because of their reliance on subjective ratings and because they lack comparison to other methods for assessing fatigue. In this paper, we present an analysis of voice recordings and psychophysiological test scores collected from seven aerospace personnel during a training task in which they remained awake for 60 h. We show that voice features and test scores are affected by both the total time spent awake and the time position within each subject’s circadian cycle. However, we show that time spent awake and time-of-day information are poor predictors of the test results, while voice features can give good predictions of the psychophysiological test scores and sleep latency. Mean absolute errors of prediction are possible within about 17.5% for sleep latency and 5–12% for test scores. We discuss the implications for the use of voice as a means to monitor the effects of fatigue on cognitive performance in practical applications. PMID:26380259
An integrated tool for the diagnosis of voice disorders.

PubMed

Godino-Llorente, Juan I; Sáenz-Lechón, Nicolás; Osma-Ruiz, Víctor; Aguilera-Navarro, Santiago; Gómez-Vilda, Pedro

2006-04-01

A PC-based integrated aid tool has been developed for the analysis and screening of pathological voices. With it the user can simultaneously record speech, electroglottographic (EGG), and videoendoscopic signals, and synchronously edit them to select the most significant segments. These multimedia data are stored on a relational database, together with a patient's personal information, anamnesis, diagnosis, visits, explorations and any other comment the specialist may wish to include. The speech and EGG waveforms are analysed by means of temporal representations and the quantitative measurements of parameters such as spectrograms, frequency and amplitude perturbation measurements, harmonic energy, noise, etc. are calculated using digital signal processing techniques, giving an idea of the degree of hoarseness and quality of the voice register. Within this framework, the system uses a standard protocol to evaluate and build complete databases of voice disorders. The target users of this system are speech and language therapists and ear nose and throat (ENT) clinicians. The application can be easily configured to cover the needs of both groups of professionals. The software has a user-friendly Windows style interface. The PC should be equipped with standard sound and video capture cards. Signals are captured using common transducers: a microphone, an electroglottograph and a fiberscope or telelaryngoscope. The clinical usefulness of the system is addressed in a comprehensive evaluation section.
Voice recognition products-an occupational risk for users with ULDs?

PubMed

Williams, N R

2003-10-01

Voice recognition systems (VRS) allow speech to be converted both directly into text-which appears on the screen of a computer-and to direct equipment to perform specific functions. Suggested applications are many and varied, including increasing efficiency in the reporting of radiographs, allowing directed surgery and enabling individuals with upper limb disorders (ULDs) who cannot use other input devices, such as keyboards and mice, to carry out word processing and other activities. Aim This paper describes four cases of vocal dysfunction related to the use of such software, which have been identified from the database of the Voice and Speech Laboratory of the Massachusetts Eye and Ear infirmary (MEEI). The database was searched using key words 'voice recognition' and four cases were identified from a total of 4800. In all cases, the VRS was supplied to assist individuals with ULDs who could not use conventional input devices. Case reports illustrate time of onset and symptoms experienced. The cases illustrate the need for risk assessment and consideration of the ergonomic aspects of voice use prior to such adaptations being used, particularly in those who already experience work-related ULDs.
Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems.

PubMed

Greene, Beth G; Logan, John S; Pisoni, David B

1986-03-01

We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.

Practice Patterns of Speech-Language Pathologists in Pediatric Vocal Health.

PubMed

Hartley, Naomi A; Braden, Maia; Thibeault, Susan L

2017-05-17

The purpose of this study was to investigate current practices of speech-language pathologists (SLPs) in the management of pediatric vocal health, with specific analysis of the influence of clinical specialty and workplace setting on management approaches. American Speech-Language-Hearing Association-certified clinicians providing services within the United States (1%-100% voice caseload) completed an anonymous online survey detailing clinician demographics; employment location and service delivery models; approaches to continuing professional development; and specifics of case management, including assessment, treatment, and discharge procedures. Current practice patterns were analyzed for 100 SLPs (0-42 years of experience; 77 self-identifying as voice specialists) providing services in 34 U.S. states across a range of metropolitan and nonmetropolitan workplace settings. In general, SLPs favored a multidisciplinary approach to management; included perceptual, instrumental, and quality of life measures during evaluation; and tailored intervention to the individual using a combination of therapy approaches. In contrast with current practice guidelines, only half reported requiring an otolaryngology evaluation prior to initiating treatment. Both clinical specialty and workplace setting were found to affect practice patterns. SLPs in school settings were significantly less likely to consider themselves voice specialists compared with all other work environments. Those SLPs who considered themselves voice specialists were significantly more likely to utilize voice-specific assessment and treatment approaches. SLP practice largely mirrors current professional practice guidelines; however, potential exists to further enhance client care. To ensure that SLPs are best able to support children in successful communication, further research, education, and advocacy are required.
Objective and subjective assessment of tracheoesophageal prosthesis voice outcome.

PubMed

D'Alatri, Lucia; Bussu, Francesco; Scarano, Emanuele; Paludetti, Gaetano; Marchese, Maria Raffaella

2012-09-01

To investigate the relationships between objective measures and the results of subjective assessment of voice quality and speech intelligibility in patients submitted to total laryngectomy and tracheoesophageal (TE) puncture. Retrospective. Twenty patients implanted with voice prosthesis were studied. After surgery, the entire sample performed speech rehabilitation. The assessment protocol included maximum phonation time (MPT), number of syllables per deep breath, acoustic analysis of the sustained vowel /a/ and of a bisyllabic word, perceptual evaluation (pleasantness and intelligibility%), and self-assessment. The correlation between pleasantness and intelligibility% was statistically significant. Both the latter were significantly correlated with the acoustic signal type, the number of formant peaks, and the F2-F1 difference. The intelligibility% and number of formant peaks were significantly correlated with the MPT and number of syllables per deep breath. Moreover, significant correlations were found between the number of formant peaks and both intelligibility% and pleasantness. The higher the number of syllables per deep breath and the longer the MPT, significantly higher was the number of formant peaks and the intelligibility%. The study failed to show significant correlation between patient's self-assessment of voice quality and both pleasantness and communication effectiveness. The multidimensional assessment seems to be a reliable tool to evaluate the TE functional outcome. Particularly, the results showed that both pleasantness and intelligibility of TE speech are correlated to the availability of expired air and the function of the vocal tract. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Systematic studies of modified vocalization: the effect of speech rate on speech production measures during metronome-paced speech in persons who stutter.

PubMed

Davidow, Jason H

2014-01-01

Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control speech rate between conditions limits our ability to determine if the changes were necessary for fluency. This study examined the effect of speech rate on several speech production variables during one-syllable-per-beat metronomic speech in order to determine changes that may be important for fluency during this fluency-inducing condition. Thirteen persons who stutter (PWS), aged 18-62 years, completed a series of speaking tasks. Several speech production variables were compared between conditions produced at different metronome beat rates, and between a control condition and a metronome-paced speech condition produced at a rate equal to the control condition. Vowel duration, voice onset time, pressure rise time and phonated intervals were significantly impacted by metronome beat rate. Voice onset time and the percentage of short (30-100 ms) phonated intervals significantly decreased from the control condition to the equivalent rate metronome-paced speech condition. A reduction in the percentage of short phonated intervals may be important for fluency during syllable-based metronome-paced speech for PWS. Future studies should continue examining the necessity of this reduction. In addition, speech rate must be controlled in future fluency-inducing condition studies, including neuroimaging investigations, in order for this research to make a substantial contribution to finding the fluency-inducing mechanism of fluency-inducing conditions. © 2013 Royal College of Speech and Language Therapists.
Status Report on Speech Research. A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications.

DTIC Science & Technology

1985-10-01

speech errors. References Anderson, V. A. (1942). Training the speaking voice. New York: Oxford University Press. 50...is only about speech perception , in contrast to some t.at deal with other perceptual processes (e.g., Berkeley, 1709; Fest- inger, Burnham, Ono...there a process of learned equivalence. An example is the claim that the 66 * ° - . . Liberman & Mattingly: The Motor Theory of Speech Perception Revised
Acoustic analysis of voice in children with cleft palate and velopharyngeal insufficiency.

PubMed

Villafuerte-Gonzalez, Rocio; Valadez-Jimenez, Victor M; Hernandez-Lopez, Xochiquetzal; Ysunza, Pablo Antonio

2015-07-01

Acoustic analysis of voice can provide instrumental data concerning vocal abnormalities. These findings can be used for monitoring clinical course in cases of voice disorders. Cleft palate severely affects the structure of the vocal tract. Hence, voice quality can also be also affected. To study whether the main acoustic parameters of voice, including fundamental frequency, shimmer and jitter are significantly different in patients with a repaired cleft palate, as compared with normal children without speech, language and voice disorders. Fourteen patients with repaired unilateral cleft lip and palate and persistent or residual velopharyngeal insufficiency (VPI) were studied. A control group was assembled with healthy volunteer subjects matched by age and gender. Hypernasality and nasal emission were perceptually assessed in patients with VPI. Size of the gap as assessed by videonasopharyngoscopy was classified in patients with VPI. Acoustic analysis of voice including Fundamental frequency (F0), shimmer and jitter were compared between patients with VPI and control subjects. F0 was significantly higher in male patients as compared with male controls. Shimmer was significantly higher in patients with VPI regardless of gender. Moreover, patients with moderate VPI showed a significantly higher shimmer perturbation, regardless of gender. Although future research regarding voice disorders in patients with VPI is needed, at the present time it seems reasonable to include strategies for voice therapy in the speech and language pathology intervention plan for patients with VPI. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
The Human Voice in Speech and Singing

NASA Astrophysics Data System (ADS)

Lindblom, Björn; Sundberg, Johan

This chapter describes various aspects of the human voice as a means of communication in speech and singing. From the point of view of function, vocal sounds can be regarded as the end result of a three stage process: (1) the compression of air in the respiratory system, which produces an exhalatory airstream, (2) the vibrating vocal folds' transformation of this air stream to an intermittent or pulsating air stream, which is a complex tone, referred to as the voice source, and (3) the filtering of this complex tone in the vocal tract resonator. The main function of the respiratory system is to generate an overpressure of air under the glottis, or a subglottal pressure. Section 16.1 describes different aspects of the respiratory system of significance to speech and singing, including lung volume ranges, subglottal pressures, and how this pressure is affected by the ever-varying recoil forces. The complex tone generated when the air stream from the lungs passes the vibrating vocal folds can be varied in at least three dimensions: fundamental frequency, amplitude and spectrum. Section 16.2 describes how these properties of the voice source are affected by the subglottal pressure, the length and stiffness of the vocal folds and how firmly the vocal folds are adducted. Section 16.3 gives an account of the vocal tract filter, how its form determines the frequencies of its resonances, and Sect. 16.4 gives an account for how these resonance frequencies or formants shape the vocal sounds by imposing spectrum peaks separated by spectrum valleys, and how the frequencies of these peaks determine vowel and voice qualities. The remaining sections of the chapter describe various aspects of the acoustic signals used for vocal communication in speech and singing. The syllable structure is discussed in Sect. 16.5, the closely related aspects of rhythmicity and timing in speech and singing is described in Sect. 16.6, and pitch and rhythm aspects in Sect. 16.7. The impressive control of all these acoustic characteristics of vocal signals is discussed in Sect. 16.8, while Sect. 16.9 considers expressive aspects of vocal communication.
Politeness, emotion, and gender: A sociophonetic study of voice pitch modulation

NASA Astrophysics Data System (ADS)

Yuasa, Ikuko

The present dissertation is a cross-gender and cross-cultural sociophonetic exploration of voice pitch characteristics utilizing speech data derived from Japanese and American speakers in natural conversations. The roles of voice pitch modulation in terms of the concepts of politeness and emotion as they pertain to culture and gender will be investigated herein. The research interprets the significance of my findings based on the acoustic measurements of speech data as they are presented in the ERB-rate scale (the most appropriate scale for human speech perception). The investigation reveals that pitch range modulation displayed by Japanese informants in two types of conversations is closely linked to types of politeness adopted by those informants. The degree of the informants' emotional involvement and expressions reflected in differing pitch range widths plays an important role in determining the relationship between pitch range modulation and politeness. The study further correlates the Japanese cultural concept of enryo ("self-restraint") with this phenomenon. When median values were examined, male and female pitch ranges across cultures did not conspicuously differ. However, sporadically occurring women's pitch characteristics which culturally differ in width and height of pitch ranges may create an 'emotional' perception of women's speech style. The salience of these pitch characteristics appears to be the source of the stereotypically linked sound of women's speech being identified as 'swoopy' or 'shrill' and thus 'emotional'. Such women's salient voice characteristics are interpreted in light of camaraderie/positive politeness. Women's use of conspicuous paralinguistic features helps to create an atmosphere of camaraderie. These voice pitch characteristics promote the establishment of a sense of camaraderie since they act to emphasize such feelings as concern, support, and comfort towards addressees, Moreover, men's wide pitch ranges are discussed in view of politeness (rather than gender). Japanese men's use of wide pitch ranges during conversations with familiar interlocutors demonstrate the extent to which male speakers can increase their pitch ranges if there is an authentic socio-cultural inspiration (other than a gender-related one) to do so. The findings suggest the necessity of interpreting research data in consideration of how the notion of gender interacts with other socio-cultural behavioral norms.
Electrophysiological and hemodynamic mismatch responses in rats listening to human speech syllables.

PubMed

Mahmoudzadeh, Mahdi; Dehaene-Lambertz, Ghislaine; Wallois, Fabrice

2017-01-01

Speech is a complex auditory stimulus which is processed according to several time-scales. Whereas consonant discrimination is required to resolve rapid acoustic events, voice perception relies on slower cues. Humans, right from preterm ages, are particularly efficient to encode temporal cues. To compare the capacities of preterms to those observed in other mammals, we tested anesthetized adult rats by using exactly the same paradigm as that used in preterm neonates. We simultaneously recorded neural (using ECoG) and hemodynamic responses (using fNIRS) to series of human speech syllables and investigated the brain response to a change of consonant (ba vs. ga) and to a change of voice (male vs. female). Both methods revealed concordant results, although ECoG measures were more sensitive than fNIRS. Responses to syllables were bilateral, but with marked right-hemispheric lateralization. Responses to voice changes were observed with both methods, while only ECoG was sensitive to consonant changes. These results suggest that rats more effectively processed the speech envelope than fine temporal cues in contrast with human preterm neonates, in whom the opposite effects were observed. Cross-species comparisons constitute a very valuable tool to define the singularities of the human brain and species-specific bias that may help human infants to learn their native language.
Evolution of crossmodal reorganization of the voice area in cochlear-implanted deaf patients.

PubMed

Rouger, Julien; Lagleyre, Sébastien; Démonet, Jean-François; Fraysse, Bernard; Deguine, Olivier; Barone, Pascal

2012-08-01

Psychophysical and neuroimaging studies in both animal and human subjects have clearly demonstrated that cortical plasticity following sensory deprivation leads to a brain functional reorganization that favors the spared modalities. In postlingually deaf patients, the use of a cochlear implant (CI) allows a recovery of the auditory function, which will probably counteract the cortical crossmodal reorganization induced by hearing loss. To study the dynamics of such reversed crossmodal plasticity, we designed a longitudinal neuroimaging study involving the follow-up of 10 postlingually deaf adult CI users engaged in a visual speechreading task. While speechreading activates Broca's area in normally hearing subjects (NHS), the activity level elicited in this region in CI patients is abnormally low and increases progressively with post-implantation time. Furthermore, speechreading in CI patients induces abnormal crossmodal activations in right anterior regions of the superior temporal cortex normally devoted to processing human voice stimuli (temporal voice-sensitive areas-TVA). These abnormal activity levels diminish with post-implantation time and tend towards the levels observed in NHS. First, our study revealed that the neuroplasticity after cochlear implantation involves not only auditory but also visual and audiovisual speech processing networks. Second, our results suggest that during deafness, the functional links between cortical regions specialized in face and voice processing are reallocated to support speech-related visual processing through cross-modal reorganization. Such reorganization allows a more efficient audiovisual integration of speech after cochlear implantation. These compensatory sensory strategies are later completed by the progressive restoration of the visuo-audio-motor speech processing loop, including Broca's area. Copyright © 2011 Wiley Periodicals, Inc.
Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures During Metronome-Paced Speech in Persons who Stutter

PubMed Central

Davidow, Jason H.

2013-01-01

Background Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control speech rate between conditions limits our ability to determine if the changes were necessary for fluency. Aims This study examined the effect of speech rate on several speech production variables during one-syllable-per-beat metronomic speech, in order to determine changes that may be important for fluency during this fluency-inducing condition. Methods and Procedures Thirteen persons who stutter (PWS), aged 18–62 years, completed a series of speaking tasks. Several speech production variables were compared between conditions produced at different metronome beat rates, and between a control condition and a metronome-paced speech condition produced at a rate equal to the control condition. Outcomes & Results Vowel duration, voice onset time, pressure rise time, and phonated intervals were significantly impacted by metronome beat rate. Voice onset time and the percentage of short (30–100 ms) phonated intervals significantly decreased from the control condition to the equivalent rate metronome-paced speech condition. Conclusions & Implications A reduction in the percentage of short phonated intervals may be important for fluency during syllable-based metronome-paced speech for PWS. Future studies should continue examining the necessity of this reduction. In addition, speech rate must be controlled in future fluency-inducing condition studies, including neuroimaging investigations, in order for this research to make a substantial contribution to finding the fluency-inducing mechanism of fluency-inducing conditions. PMID:24372888
Design of digital voice storage and playback system

NASA Astrophysics Data System (ADS)

Tang, Chao

2018-03-01

Based on STC89C52 chip, this paper presents a single chip microcomputer minimum system, which is used to realize the logic control of digital speech storage and playback system. Compared with the traditional tape voice recording system, the system has advantages of small size, low power consumption, The effective solution of traditional voice recording system is limited in the use of electronic and information processing.
Biphonation in voice signals

NASA Astrophysics Data System (ADS)

Herzel, Hanspeter; Reuter, Robert

1996-06-01

Irregularities in voiced speech are often observed as a consequence of vocal fold lesions, paralyses, and other pathological conditions. Many of these instabilities are related to the intrinsic nonlinearities in the vibrations of the vocal folds. In this paper, a specific nonlinear phenomenon is discussed: The appearance of two independent fundamental frequencies termed biphonation. Several narrow-band spectrograms are presented showing biphonation in signals from voice patients, a newborn cry, a singer, and excised larynx experiments. Finally, possible physiological mechanisms of instabilities of the voice source are discussed.
Contribution of a contralateral hearing aid to perception of consonant voicing, intonation, and emotional state in adult cochlear implantees.

PubMed

Most, Tova; Gaon-Sivan, Gal; Shpak, Talma; Luntz, Michal

2012-01-01

Binaural hearing in cochlear implant (CI) users can be achieved either by bilateral implantation or bimodally with a contralateral hearing aid (HA). Binaural-bimodal hearing has the advantage of complementing the high-frequency electric information from the CI by low-frequency acoustic information from the HA. We examined the contribution of a contralateral HA in 25 adult implantees to their perception of fundamental frequency-cued speech characteristics (initial consonant voicing, intonation, and emotions). Testing with CI alone, HA alone, and bimodal hearing showed that all three characteristics were best perceived under the bimodal condition. Significant differences were recorded between bimodal and HA conditions in the initial voicing test, between bimodal and CI conditions in the intonation test, and between both bimodal and CI conditions and between bimodal and HA conditions in the emotion-in-speech test. These findings confirmed that such binaural-bimodal hearing enhances perception of these speech characteristics and suggest that implantees with residual hearing in the contralateral ear may benefit from a HA in that ear.
[Vocal effectiveness in speech and singing: acoustical, physiological and perceptive aspects. applications in speech therapy].

PubMed

Pillot, C; Vaissière, J

2006-01-01

What is vocal effectiveness in lyrical singing in comparison to speech? Our study tries to answer this question, using vocal efficiency and spectral vocal effectiveness. Vocal efficiency was mesured for a trained and untrained subject. According to these invasive measures, it appears that the trained singer uses her larynx less efficiently. Efficiency of the larynx in terms of energy then appears to be secondary to the desired voice quality. The acoustic measures of spectral vocal effectiveness of vowels and sentences, spoken and sung by 23 singers, reveal two complementary markers: The "singing power ratio" and the difference in amplitude between the singing formant and the spectral minimum that follows it. Magnetic resonance imaging and simulations of [a], [i] and [o] spoken and sung show laryngeal lowering and the role of the piriform sinuses as the physiological foundations of spectral vocal effectiveness, perceptively related to carrying power. These scientifical aspects allow applications in voice therapy, such as physiological and perceptual foundations allowing patients to recuperate voice carrying power with or without background noise.
Voice stress analysis

NASA Technical Reports Server (NTRS)

Brenner, Malcolm; Shipp, Thomas

1988-01-01

In a study of the validity of eight candidate voice measures (fundamental frequency, amplitude, speech rate, frequency jitter, amplitude shimmer, Psychological Stress Evaluator scores, energy distribution, and the derived measure of the above measures) for determining psychological stress, 17 males age 21 to 35 were subjected to a tracking task on a microcomputer CRT while parameters of vocal production as well as heart rate were measured. Findings confirm those of earlier studies that increases in fundamental frequency, amplitude, and speech rate are found in speakers involved in extreme levels of stress. In addition, it was found that the same changes appear to occur in a regular fashion within a more subtle level of stress that may be characteristic, for example, of routine flying situations. None of the individual speech measures performed as robustly as did heart rate.
Voice onset time for female trained and untrained singers during speech and singing.

PubMed

McCrea, Christopher R; Morris, Richard J

2007-01-01

The purpose of this study was to examine the voice onset times of female trained and untrained singers during spoken and sung tasks. Thirty females were digitally recorded speaking and singing short phrases containing the English stop consonants /p/ and /b/ in the word-initial position. Voice onset time was measured for each phoneme and statistically analyzed. Mixed-ANOVAs revealed significantly longer voice onset time durations during speech for /p/ as compared to sung productions. No significant differences between the trained singers and untrained singers were observed. In addition, no task differences occurred for the /b/ productions. The results indicated that the type of phonatory task influences VOT for voiceless stops in females. As a result of this activity, the reader will be able to (1) understand articulatory and phonatory differences between spoken and sung productions; (2) understand the articulatory and phonatory timing differences between trained singers and untrained singers during spoken and sung productions.
Team Training through Communications Control

DTIC Science & Technology

1982-02-01

training * operational environment * team training research issues * training approach * team communications * models of operator beharior e...on the market soon, it certainly would be investigated carefully for its applicability to the team training problem. ce A text-to-speech voice...generation system. Votrax has recently marketed such a device, and others may soon follow suit. ’ d. A speech replay system designed to produce speech from
Neural Processing of Congruent and Incongruent Audiovisual Speech in School-Age Children and Adults

ERIC Educational Resources Information Center

Heikkilä, Jenni; Tiippana, Kaisa; Loberg, Otto; Leppänen, Paavo H. T.

2018-01-01

Seeing articulatory gestures enhances speech perception. Perception of auditory speech can even be changed by incongruent visual gestures, which is known as the McGurk effect (e.g., dubbing a voice saying /mi/ onto a face articulating /ni/, observers often hear /ni/). In children, the McGurk effect is weaker than in adults, but no previous…
Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise

NASA Technical Reports Server (NTRS)

Huang, Yiteng Arden; Chen, Jingdong; Chen, Shaoyan Sharyl

2009-01-01

A new approach has been proposed for increasing astronaut comfort and speech capture. Currently, the special design of a spacesuit forms an extreme acoustic environment making it difficult to capture clear speech without compromising comfort. The proposed Integrated Spacesuit Audio (ISA) system is to incorporate the microphones into the helmet and use software to extract voice signals from background noise.
Ultrasound applicability in Speech Language Pathology and Audiology.

PubMed

Barberena, Luciana da Silva; Brasil, Brunah de Castro; Melo, Roberta Michelon; Mezzomo, Carolina Lisbôa; Mota, Helena Bolli; Keske-Soares, Márcia

2014-01-01

To present recent studies that used the ultrasound in the fields of Speech Language Pathology and Audiology, which evidence possibilities of the applicability of this technique in different subareas. A bibliographic research was carried out in the PubMed database, using the keywords "ultrasonic," "speech," "phonetics," "Speech, Language and Hearing Sciences," "voice," "deglutition," and "myofunctional therapy," comprising some areas of Speech Language Pathology and Audiology Sciences. The keywords "ultrasound," "ultrasonography," "swallow," "orofacial myofunctional therapy," and "orofacial myology" were also used in the search. Studies in humans from the past 5 years were selected. In the preselection, duplicated studies, articles not fully available, and those that did not present direct relation between ultrasound and Speech Language Pathology and Audiology Sciences were discarded. The data were analyzed descriptively and classified subareas of Speech Language Pathology and Audiology Sciences. The following items were considered: purposes, participants, procedures, and results. We selected 12 articles for ultrasound versus speech/phonetics subarea, 5 for ultrasound versus voice, 1 for ultrasound versus muscles of mastication, and 10 for ultrasound versus swallow. Studies relating "ultrasound" and "Speech Language Pathology and Audiology Sciences" in the past 5 years were not found. Different studies on the use of ultrasound in Speech Language Pathology and Audiology Sciences were found. Each of them, according to its purpose, confirms new possibilities of the use of this instrument in the several subareas, aiming at a more accurate diagnosis and new evaluative and therapeutic possibilities.

Voice recognition through phonetic features with Punjabi utterances

NASA Astrophysics Data System (ADS)

Kaur, Jasdeep; Juglan, K. C.; Sharma, Vishal; Upadhyay, R. K.

2017-07-01

This paper deals with perception and disorders of speech in view of Punjabi language. Visualizing the importance of voice identification, various parameters of speaker identification has been studied. The speech material was recorded with a tape recorder in their normal and disguised mode of utterances. Out of the recorded speech materials, the utterances free from noise, etc were selected for their auditory and acoustic spectrographic analysis. The comparison of normal and disguised speech of seven subjects is reported. The fundamental frequency (F0) at similar places, Plosive duration at certain phoneme, Amplitude ratio (A1:A2) etc. were compared in normal and disguised speech. It was found that the formant frequency of normal and disguised speech remains almost similar only if it is compared at the position of same vowel quality and quantity. If the vowel is more closed or more open in the disguised utterance the formant frequency will be changed in comparison to normal utterance. The ratio of the amplitude (A1: A2) is found to be speaker dependent. It remains unchanged in the disguised utterance. However, this value may shift in disguised utterance if cross sectioning is not done at the same location.
Processing Electromyographic Signals to Recognize Words

NASA Technical Reports Server (NTRS)

Jorgensen, C. C.; Lee, D. D.

2009-01-01

A recently invented speech-recognition method applies to words that are articulated by means of the tongue and throat muscles but are otherwise not voiced or, at most, are spoken sotto voce. This method could satisfy a need for speech recognition under circumstances in which normal audible speech is difficult, poses a hazard, is disturbing to listeners, or compromises privacy. The method could also be used to augment traditional speech recognition by providing an additional source of information about articulator activity. The method can be characterized as intermediate between (1) conventional speech recognition through processing of voice sounds and (2) a method, not yet developed, of processing electroencephalographic signals to extract unspoken words directly from thoughts. This method involves computational processing of digitized electromyographic (EMG) signals from muscle innervation acquired by surface electrodes under a subject's chin near the tongue and on the side of the subject s throat near the larynx. After preprocessing, digitization, and feature extraction, EMG signals are processed by a neural-network pattern classifier, implemented in software, that performs the bulk of the recognition task as described.
The Effect of Noise on Relationships Between Speech Intelligibility and Self-Reported Communication Measures in Tracheoesophageal Speakers.

PubMed

Eadie, Tanya L; Otero, Devon Sawin; Bolt, Susan; Kapsner-Smith, Mara; Sullivan, Jessica R

2016-08-01

The purpose of this study was to examine how sentence intelligibility relates to self-reported communication in tracheoesophageal speakers when speech intelligibility is measured in quiet and noise. Twenty-four tracheoesophageal speakers who were at least 1 year postlaryngectomy provided audio recordings of 5 sentences from the Sentence Intelligibility Test. Speakers also completed self-reported measures of communication-the Voice Handicap Index-10 and the Communicative Participation Item Bank short form. Speech recordings were presented to 2 groups of inexperienced listeners who heard sentences in quiet or noise. Listeners transcribed the sentences to yield speech intelligibility scores. Very weak relationships were found between intelligibility in quiet and measures of voice handicap and communicative participation. Slightly stronger, but still weak and nonsignificant, relationships were observed between measures of intelligibility in noise and both self-reported measures. However, 12 speakers who were more than 65% intelligible in noise showed strong and statistically significant relationships with both self-reported measures (R2 = .76-.79). Speech intelligibility in quiet is a weak predictor of self-reported communication measures in tracheoesophageal speakers. Speech intelligibility in noise may be a better metric of self-reported communicative function for speakers who demonstrate higher speech intelligibility in noise.
Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems

PubMed Central

GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.

2012-01-01

We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916
47 CFR 64.603 - Provision of services.

Code of Federal Regulations, 2010 CFR

2010-10-01

... telephone voice transmission services shall provide, not later than July 26, 1993, in compliance with the regulations prescribed herein, throughout the area in which it offers services, telecommunications relay... other carriers. Speech-to-speech relay service and interstate Spanish language relay service shall be...
ERP correlates of motivating voices: quality of motivation and time-course matters

PubMed Central

Zougkou, Konstantina; Weinstein, Netta

2017-01-01

Abstract Here, we conducted the first study to explore how motivations expressed through speech are processed in real-time. Participants listened to sentences spoken in two types of well-studied motivational tones (autonomy-supportive and controlling), or a neutral tone of voice. To examine this, listeners were presented with sentences that either signaled motivations through prosody (tone of voice) and words simultaneously (e.g. ‘You absolutely have to do it my way’ spoken in a controlling tone of voice), or lacked motivationally biasing words (e.g. ‘Why don’t we meet again tomorrow’ spoken in a motivational tone of voice). Event-related brain potentials (ERPs) in response to motivations conveyed through words and prosody showed that listeners rapidly distinguished between motivations and neutral forms of communication as shown in enhanced P2 amplitudes in response to motivational when compared with neutral speech. This early detection mechanism is argued to help determine the importance of incoming information. Once assessed, motivational language is continuously monitored and thoroughly evaluated. When compared with neutral speech, listening to controlling (but not autonomy-supportive) speech led to enhanced late potential ERP mean amplitudes, suggesting that listeners are particularly attuned to controlling messages. The importance of controlling motivation for listeners is mirrored in effects observed for motivations expressed through prosody only. Here, an early rapid appraisal, as reflected in enhanced P2 amplitudes, is only found for sentences spoken in controlling (but not autonomy-supportive) prosody. Once identified as sounding pressuring, the message seems to be preferentially processed, as shown by enhanced late potential amplitudes in response to controlling prosody. Taken together, results suggest that motivational and neutral language are differentially processed; further, the data suggest that listening to cues signaling pressure and control cannot be ignored and lead to preferential, and more in-depth processing mechanisms. PMID:28525641
Voice emotion recognition by cochlear-implanted children and their normally-hearing peers.

PubMed

Chatterjee, Monita; Zion, Danielle J; Deroche, Mickael L; Burianek, Brooke A; Limb, Charles J; Goren, Alison P; Kulkarni, Aditya M; Christensen, Julie A

2015-04-01

Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups' mean performance is similar to aNHs' performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. This article is part of a Special Issue entitled . Copyright © 2014 Elsevier B.V. All rights reserved.
ERP correlates of motivating voices: quality of motivation and time-course matters.

PubMed

Zougkou, Konstantina; Weinstein, Netta; Paulmann, Silke

2017-10-01

Here, we conducted the first study to explore how motivations expressed through speech are processed in real-time. Participants listened to sentences spoken in two types of well-studied motivational tones (autonomy-supportive and controlling), or a neutral tone of voice. To examine this, listeners were presented with sentences that either signaled motivations through prosody (tone of voice) and words simultaneously (e.g. 'You absolutely have to do it my way' spoken in a controlling tone of voice), or lacked motivationally biasing words (e.g. 'Why don't we meet again tomorrow' spoken in a motivational tone of voice). Event-related brain potentials (ERPs) in response to motivations conveyed through words and prosody showed that listeners rapidly distinguished between motivations and neutral forms of communication as shown in enhanced P2 amplitudes in response to motivational when compared with neutral speech. This early detection mechanism is argued to help determine the importance of incoming information. Once assessed, motivational language is continuously monitored and thoroughly evaluated. When compared with neutral speech, listening to controlling (but not autonomy-supportive) speech led to enhanced late potential ERP mean amplitudes, suggesting that listeners are particularly attuned to controlling messages. The importance of controlling motivation for listeners is mirrored in effects observed for motivations expressed through prosody only. Here, an early rapid appraisal, as reflected in enhanced P2 amplitudes, is only found for sentences spoken in controlling (but not autonomy-supportive) prosody. Once identified as sounding pressuring, the message seems to be preferentially processed, as shown by enhanced late potential amplitudes in response to controlling prosody. Taken together, results suggest that motivational and neutral language are differentially processed; further, the data suggest that listening to cues signaling pressure and control cannot be ignored and lead to preferential, and more in-depth processing mechanisms. © The Author (2017). Published by Oxford University Press.
Effects of voice-sparing cricotracheal resection on phonation in women.

PubMed

Tanner, Kristine; Dromey, Christopher; Berardi, Mark L; Mattei, Lisa M; Pierce, Jenny L; Wisco, Jonathan J; Hunter, Eric J; Smith, Marshall E

2017-09-01

Individuals with idiopathic subglottic stenosis (SGS) are at risk for voice disorders prior to and following surgical management. This study examined the nature and severity of voice disorders in patients with SGS before and after a revised cricotracheal resection (CTR) procedure designed to minimize adverse effects on voice function. Eleven women with idiopathic SGS provided presurgical and postsurgical audio recordings. Voice Handicap Index (VHI) scores were also collected. Cepstral, signal-to-noise, periodicity, and fundamental frequency (F 0 ) analyses were undertaken for connected speech and sustained vowel samples. Listeners made auditory-perceptual ratings of overall quality and monotonicity. Paired samples statistical analyses revealed that mean F 0 decreased from 215 Hz (standard deviation [SD] = 40 Hz) to 201 Hz (SD = 65 Hz) following surgery. In general, VHI scores decreased after surgery. Voice disorder severity based on the Cepstral Spectral Index of Dysphonia (KayPentax, Montvale, NJ) for sustained vowels decreased (improved) from 41 (SD = 41) to 25 (SD = 21) points; no change was observed for connected speech. Semitone SD (2.2 semitones) did not change from pre- to posttreatment. Auditory-perceptual ratings demonstrated similar results. These preliminary results indicate that this revised CTR procedure is promising in minimizing adverse voice effects while offering a longer-term surgical outcome for SGS. Further research is needed to determine causal factors for pretreatment voice disorders, as well as to optimize treatments in this population. 4. Laryngoscope, 127:2085-2092, 2017. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.
Listeners' Attitudes toward Children with Voice Problems

ERIC Educational Resources Information Center

Ma, Estella P.-M.; Yu, Camille H.-Y.

2013-01-01

Purpose: To investigate the attitudes of school teachers toward children with voice problems in a Chinese population. Method: Three groups of listeners participated in this study: primary school teachers, speech-language pathology students, and general university students. The participants were required to make attitude judgments on 12 voice…
DLMS Voice Data Entry.

DTIC Science & Technology

1980-06-01

34 LIST OF ILLUSTRATIONS FIGURE PAGE 1 Block Diagram of DLMS Voice Recognition System .............. S 2 Flowchart of DefaulV...particular are a speech preprocessor and a minicomputer. In the VRS, as shown in the block diagram of Fig. 1, the preprocessor is a TTI model 8040 and...Data General 6026 Magnetic Zo 4 Tape Unit Display L-- - Equipment Cabinet Fig. 1 block Diagram of DIMS Voice Recognition System qS 2. Flexible Disk
An initial study of voice characteristics of children using two different sound coding strategies in comparison to normal hearing children.

PubMed

Coelho, Ana Cristina; Brasolotto, Alcione Ghedini; Bevilacqua, Maria Cecília

2015-06-01

To compare some perceptual and acoustic characteristics of the voices of children who use the advanced combination encoder (ACE) or fine structure processing (FSP) speech coding strategies, and to investigate whether these characteristics differ from children with normal hearing. Acoustic analysis of the sustained vowel /a/ was performed using the multi-dimensional voice program (MDVP). Analyses of sequential and spontaneous speech were performed using the real time pitch. Perceptual analyses of these samples were performed using visual-analogic scales of pre-selected parameters. Seventy-six children from three years to five years and 11 months of age participated. Twenty-eight were users of ACE, 23 were users of FSP, and 25 were children with normal hearing. Although both groups with CI presented with some deviated vocal features, the users of ACE presented with voice quality more like children with normal hearing than the users of FSP. Sound processing of ACE appeared to provide better conditions for auditory monitoring of the voice, and consequently, for better control of the voice production. However, these findings need to be further investigated due to the lack of comparative studies published to understand exactly which attributes of sound processing are responsible for differences in performance.
Overall intelligibility, articulation, resonance, voice and language in a child with Nager syndrome.

PubMed

Van Lierde, Kristiane M; Luyten, Anke; Mortier, Geert; Tijskens, Anouk; Bettens, Kim; Vermeersch, Hubert

2011-02-01

The purpose of this study was to provide a description of the language and speech (intelligibility, voice, resonance, articulation) in a 7-year-old Dutch speaking boy with Nager syndrome. To reveal these features comparison was made with an age and gender related child with a similar palatal or hearing problem. Language was tested with an age appropriate language test namely the Dutch version of the Clinical Evaluation of Language Fundamentals. Regarding articulation a phonetic inventory, phonetic analysis and phonological process analysis was performed. A nominal scale with four categories was used to judge the overall speech intelligibility. A voice and resonance assessment included a videolaryngostroboscopy, a perceptual evaluation, acoustic analysis and nasometry. The most striking communication problems in this child were expressive and receptive language delay, moderately impaired speech intelligibility, the presence of phonetic and phonological disorders, resonance disorders and a high-pitched voice. The explanation for this pattern of communication is not completely straightforward. The language and the phonological impairment, only present in the child with the Nager syndrome, are not part of a more general developmental delay. The resonance disorders can be related to the cleft palate, but were not present in the child with the isolated cleft palate. One might assume that the cul-de-sac resonance and the much decreased mandibular movement and the restricted tongue lifting are caused by the restricted jaw mobility and micrognathia. To what extent the suggested mandibular distraction osteogenesis in early childhood allows increased mandibular movement and better speech outcome with increased oral resonance is subject for further research. According to the results of this study the speech and language management must be focused on receptive and expressive language skills and linguistic conceptualization, correct phonetic placement and the modification of hypernasality and nasal emission. Copyright Â© 2010 Elsevier Ireland Ltd. All rights reserved.
Speech–Language Pathology Evaluation and Management of Hyperkinetic Disorders Affecting Speech and Swallowing Function

PubMed Central

Barkmeier-Kraemer, Julie M.; Clark, Heather M.

2017-01-01

Background Hyperkinetic dysarthria is characterized by abnormal involuntary movements affecting respiratory, phonatory, and articulatory structures impacting speech and deglutition. Speech–language pathologists (SLPs) play an important role in the evaluation and management of dysarthria and dysphagia. This review describes the standard clinical evaluation and treatment approaches by SLPs for addressing impaired speech and deglutition in specific hyperkinetic dysarthria populations. Methods A literature review was conducted using the data sources of PubMed, Cochrane Library, and Google Scholar. Search terms included 1) hyperkinetic dysarthria, essential voice tremor, voice tremor, vocal tremor, spasmodic dysphonia, spastic dysphonia, oromandibular dystonia, Meige syndrome, orofacial, cervical dystonia, dystonia, dyskinesia, chorea, Huntington’s Disease, myoclonus; and evaluation/treatment terms: 2) Speech–Language Pathology, Speech Pathology, Evaluation, Assessment, Dysphagia, Swallowing, Treatment, Management, and diagnosis. Results The standard SLP clinical speech and swallowing evaluation of chorea/Huntington’s disease, myoclonus, focal and segmental dystonia, and essential vocal tremor typically includes 1) case history; 2) examination of the tone, symmetry, and sensorimotor function of the speech structures during non-speech, speech and swallowing relevant activities (i.e., cranial nerve assessment); 3) evaluation of speech characteristics; and 4) patient self-report of the impact of their disorder on activities of daily living. SLP management of individuals with hyperkinetic dysarthria includes behavioral and compensatory strategies for addressing compromised speech and intelligibility. Swallowing disorders are managed based on individual symptoms and the underlying pathophysiology determined during evaluation. Discussion SLPs play an important role in contributing to the differential diagnosis and management of impaired speech and deglutition associated with hyperkinetic disorders. PMID:28983422
Tracheostomy cannulas and voice prosthesis

PubMed Central

Kramp, Burkhard; Dommerich, Steffen

2011-01-01

Cannulas and voice prostheses are mechanical aids for patients who had to undergo tracheotomy or laryngectomy for different reasons. For better understanding of the function of those artificial devices, first the indications and particularities of the previous surgical intervention are described in the context of this review. Despite the established procedure of percutaneous dilatation tracheotomy e.g. in intensive care units, the application of epithelised tracheostomas has its own position, especially when airway obstruction is persistent (e.g. caused by traumata, inflammations, or tumors) and a longer artificial ventilation or special care of the patient are required. In order to keep the airways open after tracheotomy, tracheostomy cannulas of different materials with different functions are available. For each patient the most appropriate type of cannula must be found. Voice prostheses are meanwhile the device of choice for rapid and efficient voice rehabilitation after laryngectomy. Individual sizes and materials allow adaptation of the voice prostheses to the individual anatomical situation of the patients. The combined application of voice prostheses with HME (Head and Moisture Exchanger) allows a good vocal as well as pulmonary rehabilitation. Precondition for efficient voice prosthesis is the observation of certain surgical principles during laryngectomy. The duration of the prosthesis mainly depends on material properties and biofilms, mostly consisting of funguses and bacteries. The quality of voice with valve prosthesis is clearly superior to esophagus prosthesis or electro-laryngeal voice. Whenever possible, tracheostoma valves for free-hand speech should be applied. Physicians taking care of patients with speech prostheses after laryngectomy should know exactly what to do in case the device fails or gets lost. PMID:22073098
The impact of vocal rehabilitation on quality of life and voice handicap in patients with total laryngectomy.

PubMed

Ţiple, Cristina; Drugan, Tudor; Dinescu, Florina Veronica; Mureşan, Rodica; Chirilă, Magdalena; Cosgarea, Marcel

2016-01-01

Health-related quality of life (HRQL) and voice handicap index (VHI) of laryngectomies seem to be relevant regarding voice rehabilitation. The aim of this study is to assess the impact on HRQL and VHI of laryngectomies, following voice rehabilitation. A retrospective study done at the Ear, Nose, and Throat Department of the Emergency County Hospital. Sixty-five laryngectomees were included in this study, of which 62 of them underwent voice rehabilitation. Voice handicap and QOL were assessed using the QOL questionnaires developed by the European Organisation for Research and Treatment of Cancer (EORTC); variables used were functional scales (physical, role, cognitive, emotional, and social), symptom scales (fatigue, pain, and nausea and vomiting), global QOL scale (pain, swallowing, senses, speech, social eating, social contact, and sexuality), and the functional, physical, and emotional aspects of the voice handicap (one-way ANOVA test). The mean age of the patients was 59.22 (standard deviation = 9.00) years. A total of 26 (40%) patients had moderate VHI (between 31 and 60) and 39 (60%) patients had severe VHI (higher than 61). Results of the HRQL questionnaires showed that patients who underwent speech therapy obtained better scores in most scales ( P = 0.000). Patients with esophageal voice had a high score for functional scales compared with or without other voice rehabilitation methods ( P = 0.07), and the VHI score for transesophageal prosthesis was improved after an adjustment period. The global health status and VHI scores showed a statistically significant correlation between speaker groups. The EORTC and the VHI questionnaires offer more information regarding life after laryngectomy.
Tracheostomy cannulas and voice prosthesis.

PubMed

Kramp, Burkhard; Dommerich, Steffen

2009-01-01

Cannulas and voice prostheses are mechanical aids for patients who had to undergo tracheotomy or laryngectomy for different reasons. For better understanding of the function of those artificial devices, first the indications and particularities of the previous surgical intervention are described in the context of this review. Despite the established procedure of percutaneous dilatation tracheotomy e.g. in intensive care units, the application of epithelised tracheostomas has its own position, especially when airway obstruction is persistent (e.g. caused by traumata, inflammations, or tumors) and a longer artificial ventilation or special care of the patient are required. In order to keep the airways open after tracheotomy, tracheostomy cannulas of different materials with different functions are available. For each patient the most appropriate type of cannula must be found. Voice prostheses are meanwhile the device of choice for rapid and efficient voice rehabilitation after laryngectomy. Individual sizes and materials allow adaptation of the voice prostheses to the individual anatomical situation of the patients. The combined application of voice prostheses with HME (Head and Moisture Exchanger) allows a good vocal as well as pulmonary rehabilitation. Precondition for efficient voice prosthesis is the observation of certain surgical principles during laryngectomy. The duration of the prosthesis mainly depends on material properties and biofilms, mostly consisting of funguses and bacteries. The quality of voice with valve prosthesis is clearly superior to esophagus prosthesis or electro-laryngeal voice. Whenever possible, tracheostoma valves for free-hand speech should be applied. Physicians taking care of patients with speech prostheses after laryngectomy should know exactly what to do in case the device fails or gets lost.
Vocal exercise may attenuate acute vocal fold inflammation

PubMed Central

Abbott, Katherine Verdolini; Li, Nicole Y.K.; Branski, Ryan C.; Rosen, Clark A.; Grillo, Elizabeth; Steinhauer, Kimberly; Hebda, Patricia A.

2012-01-01

Objectives/Hypotheses The objective was to assess the utility of selected “resonant voice” exercises for the reduction of acute vocal fold inflammation. The hypothesis was that relatively large-amplitude, low-impact exercises associated with resonant voice would reduce inflammation more than spontaneous speech and possibly more than voice rest. Study Design The study design was prospective, randomized, double-blind. Methods Nine vocally healthy adults underwent a 1-hr vocal loading procedure, followed by randomization to (a) a spontaneous speech condition, (b) a vocal rest condition, or (c) a resonant voice exercise condition. Treatments were monitored in clinic for 4 hr, and continued extra-clinically until the next morning. At baseline, immediately following loading, after the 4-hr in-clinic treatment, and 24 hr post baseline, secretions were suctioned from the vocal folds bilaterally and submitted to enzyme-linked immunosorbent assay (ELISA) to estimate concentrations of key markers of tissue injury and inflammation: IL-1β, IL-6, IL-8, TNF-α, MMP-8, and IL-10. Results Complete data sets were obtained for 3 markers -- IL-1β, IL-6, and MMP-8 -- for one subject in each treatment condition. For those markers, results were poorest at 24-hr follow-up in the spontaneous speech condition, sharply improved in the voice rest condition, and best in the resonant voice condition. Average results for all markers, for all responsive subjects with normal baseline mediator concentrations, revealed an almost identical pattern. Conclusions Some forms of tissue mobilization may be useful to attenuate acute vocal fold inflammation. PMID:23177745
Speech-based Class Attendance

NASA Astrophysics Data System (ADS)

Faizel Amri, Umar; Nur Wahidah Nik Hashim, Nik; Hazrin Hany Mohamad Hanif, Noor

2017-11-01

In the department of engineering, students are required to fulfil at least 80 percent of class attendance. Conventional method requires student to sign his/her initial on the attendance sheet. However, this method is prone to cheating by having another student signing for their fellow classmate that is absent. We develop our hypothesis according to a verse in the Holy Qur’an (95:4), “We have created men in the best of mould”. Based on the verse, we believe each psychological characteristic of human being is unique and thus, their speech characteristic should be unique. In this paper we present the development of speech biometric-based attendance system. The system requires user’s voice to be installed in the system as trained data and it is saved in the system for registration of the user. The following voice of the user will be the test data in order to verify with the trained data stored in the system. The system uses PSD (Power Spectral Density) and Transition Parameter as the method for feature extraction of the voices. Euclidean and Mahalanobis distances are used in order to verified the user’s voice. For this research, ten subjects of five females and five males were chosen to be tested for the performance of the system. The system performance in term of recognition rate is found to be 60% correct identification of individuals.
Intelligibility and Acceptability Testing for Speech Technology

DTIC Science & Technology

1992-05-22

information in memory (Luce, Feustel, and Pisoni, 1983). In high workload or multiple task situations, the added effort of listening to degraded speech can lead...the DRT provides diagnostic feature scores on six phonemic features: voicing, nasality, sustention , sibilation, graveness, and compactness, and on a...of other speech materials (e.g., polysyllabic words, paragraphs) and methods ( memory , comprehension, reaction time) have been used to evaluate the

Military applications of automatic speech recognition and future requirements

NASA Technical Reports Server (NTRS)

Beek, Bruno; Cupples, Edward J.

1977-01-01

An updated summary of the state-of-the-art of automatic speech recognition and its relevance to military applications is provided. A number of potential systems for military applications are under development. These include: (1) digital narrowband communication systems; (2) automatic speech verification; (3) on-line cartographic processing unit; (4) word recognition for militarized tactical data system; and (5) voice recognition and synthesis for aircraft cockpit.
USSR Report, Cybernetics Computers and Automation Technology

DTIC Science & Technology

1985-09-05

understand each other excellently, although in their speech they frequently omit, it would seem, needed words. However, the life experience of the...participants in a conversa- tion and their perception of voice intonations and gestures make it possible to fill in the missing elements of speech ...the Soviet Union. Comrade M. S. Gorbachev’s speech pointed out that microelectronics, computer technology, instrument building and the whole
Dimension-Based Statistical Learning Affects Both Speech Perception and Production.

PubMed

Lehet, Matthew; Holt, Lori L

2017-04-01

Multiple acoustic dimensions signal speech categories. However, dimensions vary in their informativeness; some are more diagnostic of category membership than others. Speech categorization reflects these dimensional regularities such that diagnostic dimensions carry more "perceptual weight" and more effectively signal category membership to native listeners. Yet perceptual weights are malleable. When short-term experience deviates from long-term language norms, such as in a foreign accent, the perceptual weight of acoustic dimensions in signaling speech category membership rapidly adjusts. The present study investigated whether rapid adjustments in listeners' perceptual weights in response to speech that deviates from the norms also affects listeners' own speech productions. In a word recognition task, the correlation between two acoustic dimensions signaling consonant categories, fundamental frequency (F0) and voice onset time (VOT), matched the correlation typical of English, and then shifted to an "artificial accent" that reversed the relationship, and then shifted back. Brief, incidental exposure to the artificial accent caused participants to down-weight perceptual reliance on F0, consistent with previous research. Throughout the task, participants were intermittently prompted with pictures to produce these same words. In the block in which listeners heard the artificial accent with a reversed F0 × VOT correlation, F0 was a less robust cue to voicing in listeners' own speech productions. The statistical regularities of short-term speech input affect both speech perception and production, as evidenced via shifts in how acoustic dimensions are weighted. Copyright © 2016 Cognitive Science Society, Inc.
Dimension-based statistical learning affects both speech perception and production

PubMed Central

Lehet, Matthew; Holt, Lori L.

2016-01-01

Multiple acoustic dimensions signal speech categories. However, dimensions vary in their informativeness; some are more diagnostic of category membership than others. Speech categorization reflects these dimensional regularities such that diagnostic dimensions carry more “perceptual weight” and more effectively signal category membership to native listeners. Yet, perceptual weights are malleable. When short-term experience deviates from long-term language norms, such as in a foreign accent, the perceptual weight of acoustic dimensions in signaling speech category membership rapidly adjusts. The present study investigated whether rapid adjustments in listeners’ perceptual weights in response to speech that deviates from the norms also affects listeners’ own speech productions. In a word recognition task, the correlation between two acoustic dimensions signaling consonant categories, fundamental frequency (F0) and voice onset time (VOT), matched the correlation typical of English, then shifted to an “artificial accent” that reversed the relationship, and then shifted back. Brief, incidental exposure to the artificial accent caused participants to down-weight perceptual reliance on F0, consistent with previous research. Throughout the task, participants were intermittently prompted with pictures to produce these same words. In the block in which listeners heard the artificial accent with a reversed F0 x VOT correlation, F0 was a less robust cue to voicing in listeners’ own speech productions. The statistical regularities of short-term speech input affect both speech perception and production, as evidenced via shifts in how acoustic dimensions are weighted. PMID:27666146
Measures of voiced frication for automatic classification

NASA Astrophysics Data System (ADS)

Jackson, Philip J. B.; Jesus, Luis M. T.; Shadle, Christine H.; Pincas, Jonathan

2004-05-01

As an approach to understanding the characteristics of the acoustic sources in voiced fricatives, it seems apt to draw on knowledge of vowels and voiceless fricatives, which have been relatively well studied. However, the presence of both phonation and frication in these mixed-source sounds offers the possibility of mutual interaction effects, with variations across place of articulation. This paper examines the acoustic and articulatory consequences of these interactions and explores automatic techniques for finding parametric and statistical descriptions of these phenomena. A reliable and consistent set of such acoustic cues could be used for phonetic classification or speech recognition. Following work on devoicing of European Portuguese voiced fricatives [Jesus and Shadle, in Mamede et al. (eds.) (Springer-Verlag, Berlin, 2003), pp. 1-8]. and the modulating effect of voicing on frication [Jackson and Shadle, J. Acoust. Soc. Am. 108, 1421-1434 (2000)], the present study focuses on three types of information: (i) sequences and durations of acoustic events in VC transitions, (ii) temporal, spectral and modulation measures from the periodic and aperiodic components of the acoustic signal, and (iii) voicing activity derived from simultaneous EGG data. Analysis of interactions observed in British/American English and European Portuguese speech corpora will be compared, and the principal findings discussed.
The management of vocal fold nodules in children: a national survey of speech-language pathologists.

PubMed

Signorelli, Monique E; Madill, Catherine J; McCabe, Patricia

2011-06-01

The purpose of this study was to determine the management options and voice therapy techniques currently being used by practicing speech-language pathologists (SLPs) to treat vocal fold nodules (VFNs) in children. The sources used by SLPs to inform and guide their clinical decisions when managing VFNs in children were also explored. Sixty-two SLPs completed a 23-item web-based survey. Data was analysed using frequency counts, content analyses, and supplementary analyses. SLPs reported using a range of management options and voice therapy techniques to treat VFNs in children. Voice therapy was reportedly the most frequently used management option across all respondents, with the majority of SLPs using a combination of indirect and direct voice therapy techniques. When selecting voice therapy techniques, the majority of SLPs reported that they did not use the limited external evidence available to guide their clinical decisions. Additionally, the majority of SLPs reported that they frequently relied on lower levels of evidence or non-evidence-based sources to guide clinical practice both in the presence and absence of higher quality evidence. Further research needs to investigate strategies to remove the barriers that impede SLPs use of external evidence when managing VFNs in children.
Differential Neural Contributions to Native- and Foreign-Language Talker Identification

ERIC Educational Resources Information Center

Perrachione, Tyler K.; Pierrehumbert, Janet B.; Wong, Patrick C. M.

2009-01-01

Humans are remarkably adept at identifying individuals by the sound of their voice, a behavior supported by the nervous system's ability to integrate information from voice and speech perception. Talker-identification abilities are significantly impaired when listeners are unfamiliar with the language being spoken. Recent behavioral studies…
Is There an Ironic Tone of Voice?

ERIC Educational Resources Information Center

Bryant, Gregory A.; Fox Tree, Jean E.

2005-01-01

Research on nonverbal vocal cues and verbal irony has often relied on the concept of an "ironic tone of voice". Here we provide acoustic analysis and experimental evidence that this notion is oversimplified and misguided. Acoustic analyses of spontaneous ironic speech extracted from talk radio shows, both ambiguous and unambiguous in…
Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization.

PubMed

Winn, Matthew B; Won, Jong Ho; Moon, Il Joon

This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.
NIDCD: Celebrating 25 Years of Research Helping People with Communication Disorders | NIH MedlinePlus the Magazine

MedlinePlus

... Language, Voice, Balance NIDCD: Celebrating 25 Years of Research Helping People with Communication Disorders Past Issues / Fall ... and established the core mission areas of the research we support on hearing, balance, taste, smell, and ...
Talking Wheelchair

NASA Technical Reports Server (NTRS)

1981-01-01

Communication is made possible for disabled individuals by means of an electronic system, developed at Stanford University's School of Medicine, which produces highly intelligible synthesized speech. Familiarly known as the "talking wheelchair" and formally as the Versatile Portable Speech Prosthesis (VPSP). Wheelchair mounted system consists of a word processor, a video screen, a voice synthesizer and a computer program which instructs the synthesizer how to produce intelligible sounds in response to user commands. Computer's memory contains 925 words plus a number of common phrases and questions. Memory can also store several thousand other words of the user's choice. Message units are selected by operating a simple switch, joystick or keyboard. Completed message appears on the video screen, then user activates speech synthesizer, which generates a voice with a somewhat mechanical tone. With the keyboard, an experienced user can construct messages as rapidly as 30 words per minute.
Measurement errors in voice-key naming latency for Hiragana.

PubMed

Yamada, Jun; Tamaoka, Katsuo

2003-12-01

This study makes explicit the limitations and possibilities of voice-key naming latency research on single hiragana symbols (a Japanese syllabic script) by examining three sets of voice-key naming data against Sakuma, Fushimi, and Tatsumi's 1997 speech-analyzer voice-waveform data. Analysis showed that voice-key measurement errors can be substantial in standard procedures as they may conceal the true effects of significant variables involved in hiragana-naming behavior. While one can avoid voice-key measurement errors to some extent by applying Sakuma, et al.'s deltas and by excluding initial phonemes which induce measurement errors, such errors may be ignored when test items are words and other higher-level linguistic materials.
Mechanics of human voice production and control

PubMed Central

Zhang, Zhaoyan

2016-01-01

As the primary means of communication, voice plays an important role in daily life. Voice also conveys personal information such as social status, personal traits, and the emotional state of the speaker. Mechanically, voice production involves complex fluid-structure interaction within the glottis and its control by laryngeal muscle activation. An important goal of voice research is to establish a causal theory linking voice physiology and biomechanics to how speakers use and control voice to communicate meaning and personal information. Establishing such a causal theory has important implications for clinical voice management, voice training, and many speech technology applications. This paper provides a review of voice physiology and biomechanics, the physics of vocal fold vibration and sound production, and laryngeal muscular control of the fundamental frequency of voice, vocal intensity, and voice quality. Current efforts to develop mechanical and computational models of voice production are also critically reviewed. Finally, issues and future challenges in developing a causal theory of voice production and perception are discussed. PMID:27794319
Mechanics of human voice production and control.

PubMed

Zhang, Zhaoyan

2016-10-01

As the primary means of communication, voice plays an important role in daily life. Voice also conveys personal information such as social status, personal traits, and the emotional state of the speaker. Mechanically, voice production involves complex fluid-structure interaction within the glottis and its control by laryngeal muscle activation. An important goal of voice research is to establish a causal theory linking voice physiology and biomechanics to how speakers use and control voice to communicate meaning and personal information. Establishing such a causal theory has important implications for clinical voice management, voice training, and many speech technology applications. This paper provides a review of voice physiology and biomechanics, the physics of vocal fold vibration and sound production, and laryngeal muscular control of the fundamental frequency of voice, vocal intensity, and voice quality. Current efforts to develop mechanical and computational models of voice production are also critically reviewed. Finally, issues and future challenges in developing a causal theory of voice production and perception are discussed.
Only Behavioral But Not Self-Report Measures of Speech Perception Correlate with Cognitive Abilities.

PubMed

Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A

2016-01-01

Good speech perception and communication skills in everyday life are crucial for participation and well-being, and are therefore an overarching aim of auditory rehabilitation. Both behavioral and self-report measures can be used to assess these skills. However, correlations between behavioral and self-report speech perception measures are often low. One possible explanation is that there is a mismatch between the specific situations used in the assessment of these skills in each method, and a more careful matching across situations might improve consistency of results. The role that cognition plays in specific speech situations may also be important for understanding communication, as speech perception tests vary in their cognitive demands. In this study, the role of executive function, working memory (WM) and attention in behavioral and self-report measures of speech perception was investigated. Thirty existing hearing aid users with mild-to-moderate hearing loss aged between 50 and 74 years completed a behavioral test battery with speech perception tests ranging from phoneme discrimination in modulated noise (easy) to words in multi-talker babble (medium) and keyword perception in a carrier sentence against a distractor voice (difficult). In addition, a self-report measure of aided communication, residual disability from the Glasgow Hearing Aid Benefit Profile, was obtained. Correlations between speech perception tests and self-report measures were higher when specific speech situations across both were matched. Cognition correlated with behavioral speech perception test results but not with self-report. Only the most difficult speech perception test, keyword perception in a carrier sentence with a competing distractor voice, engaged executive functions in addition to WM. In conclusion, any relationship between behavioral and self-report speech perception is not mediated by a shared correlation with cognition.
Only Behavioral But Not Self-Report Measures of Speech Perception Correlate with Cognitive Abilities

PubMed Central

Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A.

2016-01-01

Good speech perception and communication skills in everyday life are crucial for participation and well-being, and are therefore an overarching aim of auditory rehabilitation. Both behavioral and self-report measures can be used to assess these skills. However, correlations between behavioral and self-report speech perception measures are often low. One possible explanation is that there is a mismatch between the specific situations used in the assessment of these skills in each method, and a more careful matching across situations might improve consistency of results. The role that cognition plays in specific speech situations may also be important for understanding communication, as speech perception tests vary in their cognitive demands. In this study, the role of executive function, working memory (WM) and attention in behavioral and self-report measures of speech perception was investigated. Thirty existing hearing aid users with mild-to-moderate hearing loss aged between 50 and 74 years completed a behavioral test battery with speech perception tests ranging from phoneme discrimination in modulated noise (easy) to words in multi-talker babble (medium) and keyword perception in a carrier sentence against a distractor voice (difficult). In addition, a self-report measure of aided communication, residual disability from the Glasgow Hearing Aid Benefit Profile, was obtained. Correlations between speech perception tests and self-report measures were higher when specific speech situations across both were matched. Cognition correlated with behavioral speech perception test results but not with self-report. Only the most difficult speech perception test, keyword perception in a carrier sentence with a competing distractor voice, engaged executive functions in addition to WM. In conclusion, any relationship between behavioral and self-report speech perception is not mediated by a shared correlation with cognition. PMID:27242564
Children and Adults Integrate Talker and Verb Information in Online Processing

ERIC Educational Resources Information Center

Borovsky, Arielle; Creel, Sarah C.

2014-01-01

Children seem able to efficiently interpret a variety of linguistic cues during speech comprehension, yet have difficulty interpreting sources of nonlinguistic and paralinguistic information that accompany speech. The current study asked whether (paralinguistic) voice-activated role knowledge is rapidly interpreted in coordination with a…
Speech impairment in Down syndrome: a review.

PubMed

Kent, Ray D; Vorperian, Houri K

2013-02-01

This review summarizes research on disorders of speech production in Down syndrome (DS) for the purposes of informing clinical services and guiding future research. Review of the literature was based on searches using MEDLINE, Google Scholar, PsycINFO, and HighWire Press, as well as consideration of reference lists in retrieved documents (including online sources). Search terms emphasized functions related to voice, articulation, phonology, prosody, fluency, and intelligibility. The following conclusions pertain to four major areas of review: voice, speech sounds, fluency and prosody, and intelligibility. The first major area is voice. Although a number of studies have reported on vocal abnormalities in DS, major questions remain about the nature and frequency of the phonatory disorder. Results of perceptual and acoustic studies have been mixed, making it difficult to draw firm conclusions or even to identify sensitive measures for future study. The second major area is speech sounds. Articulatory and phonological studies show that speech patterns in DS are a combination of delayed development and errors not seen in typical development. Delayed (i.e., developmental) and disordered (i.e., nondevelopmental) patterns are evident by the age of about 3 years, although DS-related abnormalities possibly appear earlier, even in infant babbling. The third major area is fluency and prosody. Stuttering and/or cluttering occur in DS at rates of 10%-45%, compared with about 1% in the general population. Research also points to significant disturbances in prosody. The fourth major area is intelligibility. Studies consistently show marked limitations in this area, but only recently has the research gone beyond simple rating scales.
Task-Oriented, Naturally Elicited Speech (TONE) Database for the Force Requirements Expert System, Hawaii (FRESH)

DTIC Science & Technology

1988-09-01

Group Subgroup Command and control; Computational linguistics; expert system voice recognition; man- machine interface; U.S. Government 19 Abstract...simulates the characteristics of FRESH on a smaller scale. This study assisted NOSC in developing a voice-recognition, man- machine interface that could...scale. This study assisted NOSC in developing a voice-recogni- tion, man- machine interface that could be used with TONE and upgraded at a later date
Paving the Way for Speech: Voice-Training-Induced Plasticity in Chronic Aphasia and Apraxia of Speech—Three Single Cases

PubMed Central

Jungblut, Monika; Huber, Walter; Mais, Christiane

2014-01-01

Difficulties with temporal coordination or sequencing of speech movements are frequently reported in aphasia patients with concomitant apraxia of speech (AOS). Our major objective was to investigate the effects of specific rhythmic-melodic voice training on brain activation of those patients. Three patients with severe chronic nonfluent aphasia and AOS were included in this study. Before and after therapy, patients underwent the same fMRI procedure as 30 healthy control subjects in our prestudy, which investigated the neural substrates of sung vowel changes in untrained rhythm sequences. A main finding was that post-minus pretreatment imaging data yielded significant perilesional activations in all patients for example, in the left superior temporal gyrus, whereas the reverse subtraction revealed either no significant activation or right hemisphere activation. Likewise, pre- and posttreatment assessments of patients' vocal rhythm production, language, and speech motor performance yielded significant improvements for all patients. Our results suggest that changes in brain activation due to the applied training might indicate specific processes of reorganization, for example, improved temporal sequencing of sublexical speech components. In this context, a training that focuses on rhythmic singing with differently demanding complexity levels as concerns motor and cognitive capabilities seems to support paving the way for speech. PMID:24977055

Hoarseness in School-Aged Children and Effectiveness of Voice Therapy in International Classification of Functioning Framework.

PubMed

Akın Şenkal, Özgül; Özer, Cem

2015-09-01

The hoarseness in school-aged children disrupts the educational process because it affects the social progress, communication skills, and self-esteem of children. Besides otorhinolaryngological examination, the first treatment option is voice therapy when hoarseness occurs. The aim of the study was to determine the factors increasing the hoarseness in school-aged children by parental interview and to know preferable voice therapy on school-aged children within the frame of International Classification of Functioning (ICF). Retrospective analysis of data gathered from patient files. A total of 75 children (56 boys and 19 girls) were examined retrospectively. The age range of school-aged children is 7-14 years and average is 10.86 ± 2.51. A detailed history was taken from parents of children involved in this study. Information about vocal habits of children was gathered within the frame of ICF and then the voice therapies of children were started by scheduling appointments by an experienced speech-language pathologist. The differences between before and after voice therapy according to applied voice therapy methods, statistically significant differences were determined between maximum phonation time values and s/z rate. The relationship between voice therapy sessions and s/z rate with middle degree significance was found with physiological voice therapy sessions. According to ICF labels, most of voice complaints are matching with "body functions" and "activity and limitations." The appropriate voice therapy methods for hoarseness in school-aged children must be chosen and applied by speech-language therapists. The detailed history, which is received from family during the examination, within the frame of ICF affects the processes of choosing the voice therapy method and application of them positively. Child's family is very important for a successful management. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
What does voice-processing technology support today?

PubMed Central

Nakatsu, R; Suzuki, Y

1995-01-01

This paper describes the state of the art in applications of voice-processing technologies. In the first part, technologies concerning the implementation of speech recognition and synthesis algorithms are described. Hardware technologies such as microprocessors and DSPs (digital signal processors) are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described. In the second part, the state of the art of algorithms from the standpoint of applications is discussed. Several issues concerning evaluation of speech recognition/synthesis algorithms are covered, as well as issues concerning the robustness of algorithms in adverse conditions. Images Fig. 3 PMID:7479720
Valence-specific conflict moderation in the dorso-medial PFC and the caudate head in emotional speech.

PubMed

Kotz, Sonja A; Dengler, Reinhard; Wittfoth, Matthias

2015-02-01

Emotional speech comprises of complex multimodal verbal and non-verbal information that allows deducting others' emotional states or thoughts in social interactions. While the neural correlates of verbal and non-verbal aspects and their interaction in emotional speech have been identified, there is very little evidence on how we perceive and resolve incongruity in emotional speech, and whether such incongruity extends to current concepts of task-specific prediction errors as a consequence of unexpected action outcomes ('negative surprise'). Here, we explored this possibility while participants listened to congruent and incongruent angry, happy or neutral utterances and categorized the expressed emotions by their verbal (semantic) content. Results reveal valence-specific incongruity effects: negative verbal content expressed in a happy tone of voice increased activation in the dorso-medial prefrontal cortex (dmPFC) extending its role from conflict moderation to appraisal of valence-specific conflict in emotional speech. Conversely, the caudate head bilaterally responded selectively to positive verbal content expressed in an angry tone of voice broadening previous accounts of the caudate head in linguistic control to moderating valence-specific control in emotional speech. Together, these results suggest that control structures of the human brain (dmPFC and subcompartments of the basal ganglia) impact emotional speech differentially when conflict arises. © The Author (2014). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Emotional Prosody Measurement (EPM): a voice-based evaluation method for psychological therapy effectiveness.

PubMed

van den Broek, Egon L

2004-01-01

The voice embodies three sources of information: speech, the identity, and the emotional state of the speaker (i.e., emotional prosody). The latter feature is resembled by the variability of the F0 (also named fundamental frequency of pitch) (SD F0). To extract this feature, Emotional Prosody Measurement (EPM) was developed, which consists of 1) speech recording, 2) removal of speckle noise, 3) a Fourier Transform to extract the F0-signal, and 4) the determination of SD F0. After a pilot study in which six participants mimicked emotions by their voice, the core experiment was conducted to see whether EPM is successful. Twenty-five patients suffering from a panic disorder with agoraphobia participated. Two methods (story-telling and reliving) were used to trigger anxiety and were compared with comparable but more relaxed conditions. This resulted in a unique database of speech samples that was used to compare the EPM with the Subjective Unit of Distress to validate it as measure for anxiety/stress. The experimental manipulation of anxiety proved to be successful and EPM proved to be a successful evaluation method for psychological therapy effectiveness.
Comparison of voice-automated transcription and human transcription in generating pathology reports.

PubMed

Al-Aynati, Maamoun M; Chorneyko, Katherine A

2003-06-01

Software that can convert spoken words into written text has been available since the early 1980s. Early continuous speech systems were developed in 1994, with the latest commercially available editions having a claimed accuracy of up to 98% of speech recognition at natural speech rates. To evaluate the efficacy of one commercially available voice-recognition software system with pathology vocabulary in generating pathology reports and to compare this with human transcription. To draw cost analysis conclusions regarding human versus computer-based transcription. Two hundred six routine pathology reports from the surgical pathology material handled at St Joseph's Healthcare, Hamilton, Ontario, were generated simultaneously using computer-based transcription and human transcription. The following hardware and software were used: a desktop 450-MHz Intel Pentium III processor with 192 MB of RAM, a speech-quality sound card (Sound Blaster), noise-canceling headset microphone, and IBM ViaVoice Pro version 8 with pathology vocabulary support (Voice Automated, Huntington Beach, Calif). The cost of the hardware and software used was approximately Can 2250 dollars. A total of 23 458 words were transcribed using both methods with a mean of 114 words per report. The mean accuracy rate was 93.6% (range, 87.4%-96%) using the computer software, compared to a mean accuracy of 99.6% (range, 99.4%-99.8%) for human transcription (P <.001). Time needed to edit documents by the primary evaluator (M.A.) using the computer was on average twice that needed for editing the documents produced by human transcriptionists (range, 1.4-3.5 times). The extra time needed to edit documents was 67 minutes per week (13 minutes per day). Computer-based continuous speech-recognition systems in pathology can be successfully used in pathology practice even during the handling of gross pathology specimens. The relatively low accuracy rate of this voice-recognition software with resultant increased editing burden on pathologists may not encourage its application on a wide scale in pathology departments with sufficient human transcription services, despite significant potential financial savings. However, computer-based transcription represents an attractive and relatively inexpensive alternative to human transcription in departments where there is a shortage of transcription services, and will no doubt become more commonly used in pathology departments in the future.
Intensity Accents in French 2 Year Olds' Speech.

ERIC Educational Resources Information Center

Allen, George D.

The acoustic features and functions of accentuation in French are discussed, and features of accentuation in the speech of French 2-year-olds are explored. The four major acoustic features used to signal accentual distinctions are fundamental frequency of voicing, duration of segments and syllables, intensity of segments and syllables, and…
Assessing Disordered Speech and Voice in Parkinson's Disease: A Telerehabilitation Application

ERIC Educational Resources Information Center

Constantinescu, Gabriella; Theodoros, Deborah; Russell, Trevor; Ward, Elizabeth; Wilson, Stephen; Wootton, Richard

2010-01-01

Background: Patients with Parkinson's disease face numerous access barriers to speech pathology services for appropriate assessment and treatment. Telerehabilitation is a possible solution to this problem, whereby rehabilitation services may be delivered to the patient at a distance, via telecommunication and information technologies. A number of…
What's in a Face? Visual Contributions to Speech Segmentation

ERIC Educational Resources Information Center

Mitchel, Aaron D.; Weiss, Daniel J.

2010-01-01

Recent research has demonstrated that adults successfully segment two interleaved artificial speech streams with incongruent statistics (i.e., streams whose combined statistics are noisier than the encapsulated statistics) only when provided with an indexical cue of speaker voice. In a series of five experiments, our study explores whether…
Effective Vocal Production in Performance.

ERIC Educational Resources Information Center

King, Robert G.

If speech instructors are to teach students to recreate for an audience an author's intellectual and emotional meanings, they must teach them to use human voice effectively. Seven essential elements of effective vocal production that often pose problems for oral interpretation students should be central to any speech training program: (1)…
Bilingual Computerized Speech Recognition Screening for Depression Symptoms

ERIC Educational Resources Information Center

Gonzalez, Gerardo; Carter, Colby; Blanes, Erika

2007-01-01

The Voice-Interactive Depression Assessment System (VIDAS) is a computerized speech recognition application for screening depression based on the Center for Epidemiological Studies--Depression scale in English and Spanish. Study 1 included 50 English and 47 Spanish speakers. Study 2 involved 108 English and 109 Spanish speakers. Participants…
Mimicking Accented Speech as L2 Phonological Awareness

ERIC Educational Resources Information Center

Mora, Joan C.; Rochdi, Youssef; Kivistö-de Souza, Hanna

2014-01-01

This study investigated Spanish-speaking learners' awareness of a non-distinctive phonetic difference between Spanish and English through a delayed mimicry paradigm. We assessed learners' speech production accuracy through voice onset time (VOT) duration measures in word-initial pre-vocalic /p t k/ in Spanish and English words, and in Spanish…
The Relationship between Psychopathology and Speech and Language Disorders in Neurologic Patients.

ERIC Educational Resources Information Center

Sapir, Shimon; Aronson, Arnold E.

1990-01-01

This paper reviews findings that suggest a causal relationship between depression, anxiety, or conversion reaction and voice, speech, and language disorders in neurologic patients. The paper emphasizes the need to consider the psychosocial and psychopathological aspects of neurologic communicative disorders, the link between emotional and…
Ultra-narrow bandwidth voice coding

DOEpatents

Holzrichter, John F [Berkeley, CA; Ng, Lawrence C [Danville, CA

2007-01-09

A system of removing excess information from a human speech signal and coding the remaining signal information, transmitting the coded signal, and reconstructing the coded signal. The system uses one or more EM wave sensors and one or more acoustic microphones to determine at least one characteristic of the human speech signal.
21 CFR 874.5840 - Antistammering device.

Code of Federal Regulations, 2010 CFR

2010-04-01

... it senses the user's speech and that is intended to prevent the user from hearing the sounds of his or her own voice. The device is used to minimize a user's involuntary hesitative or repetitive speech. (b) Classification. Class I (general controls). The device is exempt from the premarket notification...
Comparing live to recorded speech in training the perception of spectrally shifted noise-vocoded speech.

PubMed

Faulkner, Andrew; Rosen, Stuart; Green, Tim

2012-10-01

Two experimental groups were trained for 2 h with live or recorded speech that was noise-vocoded and spectrally shifted and was from the same text and talker. These two groups showed equivalent improvements in performance for vocoded and shifted sentences, and the group trained with recorded speech showed consistently greater improvements than untrained controls. Another group trained with unshifted noise-vocoded speech improved no more than untrained controls. Computer-based training thus appears at least as effective as labor-intensive live-voice training for improving the perception of spectrally shifted noise-vocoded speech, and by implication, for training of users of cochlear implants.
Refining the relevant population in forensic voice comparison - A response to Hicks et alii (2015) The importance of distinguishing information from evidence/observations when formulating propositions.

PubMed

Morrison, Geoffrey Stewart; Enzinger, Ewald; Zhang, Cuiling

2016-12-01

Hicks et alii [Sci. Just. 55 (2015) 520-525. http://dx.doi.org/10.1016/j.scijus.2015.06.008] propose that forensic speech scientists not use the accent of the speaker of questioned identity to refine the relevant population. This proposal is based on a lack of understanding of the realities of forensic voice comparison. If it were implemented, it would make data-based forensic voice comparison analysis within the likelihood ratio framework virtually impossible. We argue that it would also lead forensic speech scientists to present invalid unreliable strength of evidence statements, and not allow them to conduct the tests that would make them aware of this problem. Copyright Â© 2016 The Chartered Society of Forensic Sciences. Published by Elsevier Ireland Ltd. All rights reserved.
Cost-sensitive learning for emotion robust speaker recognition.

PubMed

Li, Dongdong; Yang, Yingchun; Dai, Weihui

2014-01-01

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.
Cost-Sensitive Learning for Emotion Robust Speaker Recognition

PubMed Central

Li, Dongdong; Yang, Yingchun

2014-01-01

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. PMID:24999492
PubMed Central

SCHINDLER, A.; GINOCCHIO, D.; ATAC, M.; MARUZZI, P.; MADASCHI, S.; OTTAVIANI, F.; MOZZANICA, F.

2013-01-01

SUMMARY The objective of this study was to evaluate the reliability of the INFVo scale and its relationship with objective measures and VHI scores in 40 native Italian-speaking patients with substitution voice. The maximum phonation time (MPT), diadochokinesis (DDK) of the three syllabic sequence [pa/ta/ka], reading of a passage and a single word repetition test were recorded. Each patient completed the Italian version of the VHI. Three speech-language pathologists blindly rated the recordings using the auditory perceptual INFVo scale; one listened and assessed the voice recording twice. The INFVo intra- and inter-rater reliability reached good values. Strong to moderate correlations between the INFVo scale scores and MPT, DDK, distortions in the repetition test, speech rate during reading and the functional subscale of the VHI were found. In conclusion, the INFVo scale is a reliable tool and can be recommended for the perceptual assessment of substitution voices in Italian speaking patients. PMID:23853403
Multipath/RFI/modulation study for DRSS-RFI problem: Voice coding and intelligibility testing for a satellite-based air traffic control system

NASA Technical Reports Server (NTRS)

Birch, J. N.; Getzin, N.

1971-01-01

Analog and digital voice coding techniques for application to an L-band satellite-basedair traffic control (ATC) system for over ocean deployment are examined. In addition to performance, the techniques are compared on the basis of cost, size, weight, power consumption, availability, reliability, and multiplexing features. Candidate systems are chosen on the bases of minimum required RF bandwidth and received carrier-to-noise density ratios. A detailed survey of automated and nonautomated intelligibility testing methods and devices is presented and comparisons given. Subjective evaluation of speech system by preference tests is considered. Conclusion and recommendations are developed regarding the selection of the voice system. Likewise, conclusions and recommendations are developed for the appropriate use of intelligibility tests, speech quality measurements, and preference tests with the framework of the proposed ATC system.

Vocal fold nodules in adult singers: regional opinions about etiologic factors, career impact, and treatment. A survey of otolaryngologists, speech pathologists, and teachers of singing.

PubMed

Hogikyan, N D; Appel, S; Guinn, L W; Haxer, M J

1999-03-01

This study was undertaken to better understand current regional opinions regarding vocal fold nodules in adult singers. A questionnaire was sent to 298 persons representing the 3 professional groups most involved with the care of singers with vocal nodules: otolaryngologists, speech pathologists, and teachers of singing. The questionnaire queried respondents about their level of experience with this problem, and their beliefs about causative factors, career impact, and optimum treatment. Responses within and between groups were similar, with differences between groups primarily in the magnitude of positive or negative responses, rather than in the polarity of the responses. Prevailing opinions included: recognition of causative factors in both singing and speaking voice practices, optimism about responsiveness to appropriate treatment, enthusiasm for coordinated voice therapy and voice training as first-line treatment, and acceptance of microsurgical management as appropriate treatment if behavioral management fails.
Secure Recognition of Voice-Less Commands Using Videos

NASA Astrophysics Data System (ADS)

Yau, Wai Chee; Kumar, Dinesh Kant; Weghorn, Hans

Interest in voice recognition technologies for internet applications is growing due to the flexibility of speech-based communication. The major drawback with the use of sound for internet access with computers is that the commands will be audible to other people in the vicinity. This paper examines a secure and voice-less method for recognition of speech-based commands using video without evaluating sound signals. The proposed approach represents mouth movements in the video data using 2D spatio-temporal templates (STT). Zernike moments (ZM) are computed from STT and fed into support vector machines (SVM) to be classified into one of the utterances. The experimental results demonstrate that the proposed technique produces a high accuracy of 98% in a phoneme classification task. The proposed technique is demonstrated to be invariant to global variations of illumination level. Such a system is useful for securely interpreting user commands for internet applications on mobile devices.
Effect of target-masker similarity on across-ear interference in a dichotic cocktail-party listening task.

PubMed

Brungart, Douglas S; Simpson, Brian D

2007-09-01

Similarity between the target and masking voices is known to have a strong influence on performance in monaural and binaural selective attention tasks, but little is known about the role it might play in dichotic listening tasks with a target signal and one masking voice in the one ear and a second independent masking voice in the opposite ear. This experiment examined performance in a dichotic listening task with a target talker in one ear and same-talker, same-sex, or different-sex maskers in both the target and the unattended ears. The results indicate that listeners were most susceptible to across-ear interference with a different-sex within-ear masker and least susceptible with a same-talker within-ear masker, suggesting that the amount of across-ear interference cannot be predicted from the difficulty of selectively attending to the within-ear masking voice. The results also show that the amount of across-ear interference consistently increases when the across-ear masking voice is more similar to the target speech than the within-ear masking voice is, but that no corresponding decline in across-ear interference occurs when the across-ear voice is less similar to the target than the within-ear voice. These results are consistent with an "integrated strategy" model of speech perception where the listener chooses a segregation strategy based on the characteristics of the masker present in the target ear and the amount of across-ear interference is determined by the extent to which this strategy can also effectively be used to suppress the masker in the unattended ear.
Application of the acoustic voice quality index for objective measurement of dysphonia severity.

PubMed

Núñez-Batalla, Faustino; Díaz-Fresno, Estefanía; Álvarez-Fernández, Andrea; Muñoz Cordero, Gabriela; Llorente Pendás, José Luis

Over the past several decades, many acoustic parameters have been studied as sensitive to and to measure dysphonia. However, current acoustic measures might not be sensitive measures of perceived voice quality. A meta-analysis which evaluated the relationship between perceived overall voice quality and several acoustic-phonetic correlates, identified measures that do not rely on the extraction of the fundamental period, such the measures derived from the cepstrum, and that can be used in sustained vowel as well as continuous speech samples. A specific and recently developed method to quantify the severity of overall dysphonia is the acoustic voice quality index (AVQI) that is a multivariate construct that combines multiple acoustic markers to yield a single number that correlates reasonably with overall vocal quality. This research is based on one pool of voice recordings collected in two sets of subjects: 60 vocally normal and 58 voice disordered participants. A sustained vowel and a sample of connected speech were recorded and analyzed to obtain the six parameters included in the AVQI using the program Praat. Statistical analysis was completed using SPSS for Windows, version 12.0. Correlation between perception of overall voice quality and AVQI: A significant difference exists (t(95) = 9.5; p<.000) between normal and dysphonic voices. The findings of this study demonstrate the clinical feasibility of the AVQI as a measure of dysphonia severity. Copyright © 2017 Elsevier España, S.L.U. and Sociedad Española de Otorrinolaringología y Cirugía de Cabeza y Cuello. All rights reserved.
Collaboration and conquest: MTD as viewed by voice teacher (singing voice specialist) and speech-language pathologist.

PubMed

Goffi-Fynn, Jeanne C; Carroll, Linda M

2013-05-01

This study was designed as a qualitative case study to demonstrate the process of diagnosis and treatment between a voice team to manage a singer diagnosed with muscular tension dysphonia (MTD). Traditionally, literature suggests that MTD is challenging to treat and little in the literature directly addresses singers with MTD. Data collected included initial medical screening with laryngologist, referral to speech-language pathologist (SLP) specializing in voice disorders among singers, and adjunctive voice training with voice teacher trained in vocology (singing voice specialist or SVS). Initial target goals with SLP included reducing extrinsic laryngeal tension, using a relaxed laryngeal posture, and effective abdominal-diaphragmatic support for all phonation events. Balance of respiratory forces, laryngeal coordination, and use of optimum filtering of the source signal through resonance and articulatory awareness was emphasized. Further work with SVS included three main goals including a lowered breathing pattern to aid in decreasing subglottic air pressure, vertical laryngeal position to lower to allow for a relaxed laryngeal position, and a top-down singing approach to encourage an easier, more balanced registration, and better resonance. Initial results also emphasize the retraining of subject toward a sensory rather than auditory mode of monitoring. Other areas of consideration include singers' training and vocal use, the psychological effects of MTD, the personalities potentially associated with it, and its relationship with stress. Finally, the results emphasize that a positive rapport with the subject and collaboration between all professionals involved in a singer's care are essential for recovery. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Attentional modulation of informational masking on early cortical representations of speech signals.

PubMed

Zhang, Changxin; Arnott, Stephen R; Rabaglia, Cristina; Avivi-Reich, Meital; Qi, James; Wu, Xihong; Li, Liang; Schneider, Bruce A

2016-01-01

To recognize speech in a noisy auditory scene, listeners need to perceptually segregate the target talker's voice from other competing sounds (stream segregation). A number of studies have suggested that the attentional demands placed on listeners increase as the acoustic properties and informational content of the competing sounds become more similar to that of the target voice. Hence we would expect attentional demands to be considerably greater when speech is masked by speech than when it is masked by steady-state noise. To investigate the role of attentional mechanisms in the unmasking of speech sounds, event-related potentials (ERPs) were recorded to a syllable masked by noise or competing speech under both active (the participant was asked to respond when the syllable was presented) or passive (no response was required) listening conditions. The results showed that the long-latency auditory response to a syllable (/bi/), presented at different signal-to-masker ratios (SMRs), was similar in both passive and active listening conditions, when the masker was a steady-state noise. In contrast, a switch from the passive listening condition to the active one, when the masker was two-talker speech, significantly enhanced the ERPs to the syllable. These results support the hypothesis that the need to engage attentional mechanisms in aid of scene analysis increases as the similarity (both acoustic and informational) between the target speech and the competing background sounds increases. Copyright © 2015 Elsevier B.V. All rights reserved.
Experience with speech sounds is not necessary for cue trading by budgerigars (Melopsittacus undulatus)

PubMed Central

Flaherty, Mary; Dent, Micheal L.; Sawusch, James R.

2017-01-01

The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with “d” or “t” and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal. PMID:28562597
Experience with speech sounds is not necessary for cue trading by budgerigars (Melopsittacus undulatus).

PubMed

Flaherty, Mary; Dent, Micheal L; Sawusch, James R

2017-01-01

The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.
The Impact of Vocal Hyperfunction on Relative Fundamental Frequency during Voicing Offset and Onset

ERIC Educational Resources Information Center

Stepp, Cara E.; Hillman, Robert E.; Heaton, James T.

2010-01-01

Purpose: This study tested the hypothesis that individuals with vocal hyperfunction would show decreases in relative fundamental frequency (RFF) surrounding a voiceless consonant. Method: This retrospective study of 2 clinical databases used speech samples from 15 control participants and women with hyperfunction-related voice disorders: 82 prior…
Tracking Voice Change after Thyroidectomy: Application of Spectral/Cepstral Analyses

ERIC Educational Resources Information Center

Awan, Shaheen N.; Helou, Leah B.; Stojadinovic, Alexander; Solomon, Nancy Pearl

2011-01-01

This study evaluates the utility of perioperative spectral and cepstral acoustic analyses to monitor voice change after thyroidectomy. Perceptual and acoustic analyses were conducted on speech samples (sustained vowel /[alpha]/ and CAPE-V sentences) provided by 70 participants (36 women and 34 men) at four study time points: prior to thyroid…
Transgender Voice and Communication Treatment: A Retrospective Chart Review of 25 Cases

ERIC Educational Resources Information Center

Hancock, Adrienne B.; Garabedian, Laura M.

2013-01-01

Background: People transitioning from male to female (MTF) gender seek speech-language pathology services when they feel their voice is betraying their genuine self or perhaps is the last obstacle to representing their authentic gender. Speaking fundamental frequency (pitch) and resonance are most often targets in treatment because the combination…
Training the Speaking Voice through Singing.

ERIC Educational Resources Information Center

Sipley, Kenneth L.

Speech teachers and singing teachers have much in common. Both attempt in their teaching to develop the most powerful and effective instrument possible while trying to avoid vocal problems. Both have studied the physiology of the vocal mechanism to assist them in their teaching. Both are concerned with the expressive qualities of the voice as well…
Analysis of VOT in Turkish Speakers with Aphasia

ERIC Educational Resources Information Center

Kopkalli-Yavuz, Handan; Mavis, Ilknur; Akyildiz, Didem

2011-01-01

Studies investigating voicing onset time (VOT) production by speakers with aphasia have shown that nonfluent aphasics show a deficit in the articulatory programming of speech sounds based on the range of VOT values produced by aphasic individuals. If the VOT value lies between the normal range of VOT for the voiced and voiceless categories, then…
Sex Differences in the Older Voice.

ERIC Educational Resources Information Center

Benjamin, Barbaranne J.

A study investigated differences between older adult male and female voice patterns. In addition, the study examined whether certain differences between male and female speech characteristics were lifelong and not associated with the aging process. Subjects were 10 young (average age 30) and 10 old (average age 75) males and 10 young (average age…
Associations between speech features and phenotypic severity in Treacher Collins syndrome

PubMed Central

2014-01-01

Background Treacher Collins syndrome (TCS, OMIM 154500) is a rare congenital disorder of craniofacial development. Characteristic hypoplastic malformations of the ears, zygomatic arch, mandible and pharynx have been described in detail. However, reports on the impact of these malformations on speech are few. Exploring speech features and investigating if speech function is related to phenotypic severity are essential for optimizing follow-up and treatment. Methods Articulation, nasal resonance, voice and intelligibility were examined in 19 individuals (5–74 years, median 34 years) divided into three groups comprising children 5–10 years (n = 4), adolescents 11–18 years (n = 4) and adults 29 years and older (n = 11). A speech composite score (0–6) was calculated to reflect the variability of speech deviations. TCS severity scores of phenotypic expression and total scores of Nordic Orofacial Test-Screening (NOT-S) measuring orofacial dysfunction were used in analyses of correlation with speech characteristics (speech composite scores). Results Children and adolescents presented with significantly higher speech composite scores (median 4, range 1–6) than adults (median 1, range 0–5). Nearly all children and adolescents (6/8) displayed speech deviations of articulation, nasal resonance and voice, while only three adults were identified with multiple speech aberrations. The variability of speech dysfunction in TCS was exhibited by individual combinations of speech deviations in 13/19 participants. The speech composite scores correlated with TCS severity scores and NOT-S total scores. Speech composite scores higher than 4 were associated with cleft palate. The percent of intelligible words in connected speech was significantly lower in children and adolescents (median 77%, range 31–99) than in adults (98%, range 93–100). Intelligibility of speech among the children was markedly inconsistent and clearly affecting the understandability. Conclusions Multiple speech deviations were identified in children, adolescents and a subgroup of adults with TCS. Only children displayed markedly reduced intelligibility. Speech was significantly correlated with phenotypic severity of TCS and orofacial dysfunction. Follow-up and treatment of speech should still be focused on young patients, but some adults with TCS seem to require continuing speech and language pathology services. PMID:24775909
Associations between speech features and phenotypic severity in Treacher Collins syndrome.

PubMed

Asten, Pamela; Akre, Harriet; Persson, Christina

2014-04-28

Treacher Collins syndrome (TCS, OMIM 154500) is a rare congenital disorder of craniofacial development. Characteristic hypoplastic malformations of the ears, zygomatic arch, mandible and pharynx have been described in detail. However, reports on the impact of these malformations on speech are few. Exploring speech features and investigating if speech function is related to phenotypic severity are essential for optimizing follow-up and treatment. Articulation, nasal resonance, voice and intelligibility were examined in 19 individuals (5-74 years, median 34 years) divided into three groups comprising children 5-10 years (n = 4), adolescents 11-18 years (n = 4) and adults 29 years and older (n = 11). A speech composite score (0-6) was calculated to reflect the variability of speech deviations. TCS severity scores of phenotypic expression and total scores of Nordic Orofacial Test-Screening (NOT-S) measuring orofacial dysfunction were used in analyses of correlation with speech characteristics (speech composite scores). Children and adolescents presented with significantly higher speech composite scores (median 4, range 1-6) than adults (median 1, range 0-5). Nearly all children and adolescents (6/8) displayed speech deviations of articulation, nasal resonance and voice, while only three adults were identified with multiple speech aberrations. The variability of speech dysfunction in TCS was exhibited by individual combinations of speech deviations in 13/19 participants. The speech composite scores correlated with TCS severity scores and NOT-S total scores. Speech composite scores higher than 4 were associated with cleft palate. The percent of intelligible words in connected speech was significantly lower in children and adolescents (median 77%, range 31-99) than in adults (98%, range 93-100). Intelligibility of speech among the children was markedly inconsistent and clearly affecting the understandability. Multiple speech deviations were identified in children, adolescents and a subgroup of adults with TCS. Only children displayed markedly reduced intelligibility. Speech was significantly correlated with phenotypic severity of TCS and orofacial dysfunction. Follow-up and treatment of speech should still be focused on young patients, but some adults with TCS seem to require continuing speech and language pathology services.
Acoustic-Perceptual Correlates of Voice in Indian Hindu Purohits.

PubMed

Balasubramanium, Radish Kumar; Karuppali, Sudhin; Bajaj, Gagan; Shastry, Anuradha; Bhat, Jayashree

2018-05-16

Purohit, in the Indian religious context (Hindu), means priest. Purohits are professional voice users who use their voice while performing regular worships and rituals in temples and homes. Any deviations in their voice can have an impact on their profession. Hence, there is a need to investigate the voice characteristics of purohits using perceptual and acoustic analyses. A total of 44 men in the age range of 18-30 years were divided into two groups. Group 1 consisted of purohits who were trained since childhood (n = 22) in the traditional gurukul system. Group 2 (n = 22) consisted of normal controls. Phonation and spontaneous speech samples were obtained from all the participants at a comfortable pitch and loudness. The Praat software (Version 5.3.31) and the Speech tool were used to analyze the traditional acoustic and cepstral parameters, respectively, whereas GRBAS was used to perceptually evaluate the voice. Results of the independent t test revealed no significant differences across the groups for perceptual and traditional acoustic measures except for intensity, which was significantly higher in purohits' voices at P < 0.05. However, the cepstral values (cepstral peak prominence and smoothened cepstral peak prominence) were much higher in purohits than in controls at P < 0.05 CONCLUSIONS: Results revealed that purohits did not exhibit vocal deviations as analyzed through perceptual and acoustic parameters. In contrast, cepstral measures were higher in Indian Hindu purohits in comparison with normal controls, suggestive of a higher degree of harmonic organization in purohits. Further studies are required to analyze the physiological correlates of increased cepstral measures in purohits' voices. Copyright © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
The impact of vocal rehabilitation on quality of life and voice handicap in patients with total laryngectomy

PubMed Central

Ţiple, Cristina; Drugan, Tudor; Dinescu, Florina Veronica; Mureşan, Rodica; Chirilă, Magdalena; Cosgarea, Marcel

2016-01-01

Background: Health-related quality of life (HRQL) and voice handicap index (VHI) of laryngectomies seem to be relevant regarding voice rehabilitation. The aim of this study is to assess the impact on HRQL and VHI of laryngectomies, following voice rehabilitation. Materials and Methods: A retrospective study done at the Ear, Nose, and Throat Department of the Emergency County Hospital. Sixty-five laryngectomees were included in this study, of which 62 of them underwent voice rehabilitation. Voice handicap and QOL were assessed using the QOL questionnaires developed by the European Organisation for Research and Treatment of Cancer (EORTC); variables used were functional scales (physical, role, cognitive, emotional, and social), symptom scales (fatigue, pain, and nausea and vomiting), global QOL scale (pain, swallowing, senses, speech, social eating, social contact, and sexuality), and the functional, physical, and emotional aspects of the voice handicap (one-way ANOVA test). Results: The mean age of the patients was 59.22 (standard deviation = 9.00) years. A total of 26 (40%) patients had moderate VHI (between 31 and 60) and 39 (60%) patients had severe VHI (higher than 61). Results of the HRQL questionnaires showed that patients who underwent speech therapy obtained better scores in most scales (P = 0.000). Patients with esophageal voice had a high score for functional scales compared with or without other voice rehabilitation methods (P = 0.07), and the VHI score for transesophageal prosthesis was improved after an adjustment period. The global health status and VHI scores showed a statistically significant correlation between speaker groups. Conclusion: The EORTC and the VHI questionnaires offer more information regarding life after laryngectomy. PMID:28331513
Comparison of Pitch Strength With Perceptual and Other Acoustic Metric Outcome Measures Following Medialization Laryngoplasty.

PubMed

Rubin, Adam D; Jackson-Menaldi, Cristina; Kopf, Lisa M; Marks, Katherine; Skeffington, Jean; Skowronski, Mark D; Shrivastav, Rahul; Hunter, Eric J

2018-05-14

The diagnoses of voice disorders, as well as treatment outcomes, are often tracked using visual (eg, stroboscopic images), auditory (eg, perceptual ratings), objective (eg, from acoustic or aerodynamic signals), and patient report (eg, Voice Handicap Index and Voice-Related Quality of Life) measures. However, many of these measures are known to have low to moderate sensitivity and specificity for detecting changes in vocal characteristics, including vocal quality. The objective of this study was to compare changes in estimated pitch strength (PS) with other conventionally used acoustic measures based on the cepstral peak prominence (smoothed cepstral peak prominence, cepstral spectral index of dysphonia, and acoustic voice quality index), and clinical judgments of voice quality (GRBAS [grade, roughness, breathiness, asthenia, strain] scale) following laryngeal framework surgery. This study involved post hoc analysis of recordings from 22 patients pretreatment and post treatment (thyroplasty and behavioral therapy). Sustained vowels and connected speech were analyzed using objective measures (PS, smoothed cepstral peak prominence, cepstral spectral index of dysphonia, and acoustic voice quality index), and these results were compared with mean auditory-perceptual ratings by expert clinicians using the GRBAS scale. All four acoustic measures changed significantly in the direction that usually indicates improved voice quality following treatment (P < 0.005). Grade and breathiness correlated the strongest with the acoustic measures (|r| ~0.7) with strain being the least correlated. Acoustic analysis on running speech highly correlates with judged ratings. PS is a robust, easily obtained acoustic measure of voice quality that could be useful in the clinical environment to follow treatment of voice disorders. Copyright © 2018. Published by Elsevier Inc.
A Survey of Equipment in the Singing Voice Studio and Its Perceived Effectiveness by Vocologists and Student Singers.

PubMed

Gerhard, Julia; Rosow, David E

2016-05-01

Speech-language pathologists have long used technology for the clinical measurement of the speaking voice, but present research shows that vocal pedagogues and voice students are becoming more accepting of technology in the studio. As a result, the equipment and technology used in singing voice studios by speech-language pathologists and vocal pedagogues are changing. Although guides exist regarding equipment and technology necessary for developing a voice laboratory and private voice studio, there are no data documenting the current implementation of these items and their perceived effectiveness. This study seeks to document current trends in equipment used in voice laboratories and studios. Two separate surveys were distributed to 60 vocologists and approximately 300 student singers representative of the general singing student population. The surveys contained questions about the inventory of items found in voice studios and perceived effectiveness of these items. Data were analyzed using descriptive analyses and statistical analyses when applicable. Twenty-six of 60 potential vocologists responded, and 66 student singers responded. The vocologists reported highly uniform inventories and ratings of studio items. There were wide-ranging differences between the inventories reported by the vocologist and student singer groups. Statistically significant differences between ratings of effectiveness of studio items were found for 11 of the 17 items. In all significant cases, vocologists rated usefulness to be higher than student singers. Although the order of rankings of vocologists and student singers was similar, a much higher percentage of vocologists report the items as being efficient and effective than students. The historically typical studio items, including the keyboard and mirror, were ranked as most effective by both vocologists and student singers. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

Detecting Parkinson's disease from sustained phonation and speech signals.

PubMed

Vaiciukynas, Evaldas; Verikas, Antanas; Gelzinis, Adas; Bacauskiene, Marija

2017-01-01

This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson's disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC) and smart phone (SP) microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF) is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER) and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.
[The discrimination of mono-syllable words in noise in listeners with normal hearing].

PubMed

Yoshida, M; Sagara, T; Nagano, M; Korenaga, K; Makishima, K

1992-02-01

The discrimination of mono-syllable words (67S word-list) pronounced by a male and a female speaker was investigated in noise in 39 normal hearing subjects. The subjects listened to the test words at a constant level of 62 dB together with white or weighted noise in four S/N conditions. By processing the data with logit transformation, S/N-discrimination curves were presumed for each combination of a speech material and a noise. Regardless of the type of noise, the discrimination scores for the female voice started to decrease gradually at a S/N ratio of +10 dB, and reached 10 to 20% at-10 dB. For the male voice in white noise, the discrimination curve was similar to those for the female voice. On the contrary, the discrimination score for the male voice in weighted noise declined rapidly from a S/N ratio of +5 dB, and went below 10% at -5 dB. The discrimination curves seem to be shaped by the interrelations between the spectrum of the speech material and that of the noise.
Changes in teachers' voice quality during a working day with and without electric sound amplification.

PubMed

Jónsdottir, Valdis; Laukkanen, Anne-Maria; Siikki, Ilona

2003-01-01

The present study investigated changes in the voice quality of teachers during a working day (a). in ordinary conditions and (b). when using electrical sound amplification while teaching. Classroom speech of 5 teachers was recorded with a portable DAT recorder and a head-mounted microphone during the first and the last lesson of a hard working day first in ordinary conditions and the following week using amplification. Long-term average spectrum and sound pressure level (SPL) analyses were made. The subjects' comments were gathered by questionnaire. Voice quality was evaluated by 2 speech trainers. With amplification, SPL was lower and the spectrum more tilted. Voice quality was evaluated to be better. The subjects reported less fatigue in the vocal mechanism. Spectral tilt decreased and SPL increased during the day. There was a tendency for perceived asthenia to decrease. No significant changes were observed in ordinary conditions. The acoustic changes seem to reflect a positive adaptation to vocal loading. Their absence may be a sign of vocal fatigue. Copyright 2003 S. Karger AG, Basel
Unmasking the effects of masking on performance: The potential of multiple-voice masking in the office environment.

PubMed

Keus van de Poll, Marijke; Carlsson, Johannes; Marsh, John E; Ljung, Robert; Odelius, Johan; Schlittmeier, Sabine J; Sundin, Gunilla; Sörqvist, Patrik

2015-08-01

Broadband noise is often used as a masking sound to combat the negative consequences of background speech on performance in open-plan offices. As office workers generally dislike broadband noise, it is important to find alternatives that are more appreciated while being at least not less effective. The purpose of experiment 1 was to compare broadband noise with two alternatives-multiple voices and water waves-in the context of a serial short-term memory task. A single voice impaired memory in comparison with silence, but when the single voice was masked with multiple voices, performance was on level with silence. Experiment 2 explored the benefits of multiple-voice masking in more detail (by comparing one voice, three voices, five voices, and seven voices) in the context of word processed writing (arguably a more office-relevant task). Performance (i.e., writing fluency) increased linearly from worst performance in the one-voice condition to best performance in the seven-voice condition. Psychological mechanisms underpinning these effects are discussed.
Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants.

PubMed

Hoy, Matthew B

2018-01-01

Voice assistants are software agents that can interpret human speech and respond via synthesized voices. Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google's Assistant are the most popular voice assistants and are embedded in smartphones or dedicated home speakers. Users can ask their assistants questions, control home automation devices and media playback via voice, and manage other basic tasks such as email, to-do lists, and calendars with verbal commands. This column will explore the basic workings and common features of today's voice assistants. It will also discuss some of the privacy and security issues inherent to voice assistants and some potential future uses for these devices. As voice assistants become more widely used, librarians will want to be familiar with their operation and perhaps consider them as a means to deliver library services and materials.
Influence of Smartphones and Software on Acoustic Voice Measures

PubMed Central

GRILLO, ELIZABETH U.; BROSIOUS, JENNA N.; SORRELL, STACI L.; ANAND, SUPRAJA

2016-01-01

This study assessed the within-subject variability of voice measures captured using different recording devices (i.e., smartphones and head mounted microphone) and software programs (i.e., Analysis of Dysphonia in Speech and Voice (ADSV), Multi-dimensional Voice Program (MDVP), and Praat). Correlations between the software programs that calculated the voice measures were also analyzed. Results demonstrated no significant within-subject variability across devices and software and that some of the measures were highly correlated across software programs. The study suggests that certain smartphones may be appropriate to record daily voice measures representing the effects of vocal loading within individuals. In addition, even though different algorithms are used to compute voice measures across software programs, some of the programs and measures share a similar relationship. PMID:28775797
Speech versus manual control of camera functions during a telerobotic task

NASA Technical Reports Server (NTRS)

Bierschwale, John M.; Sampaio, Carlos E.; Stuart, Mark A.; Smith, Randy L.

1993-01-01

This investigation has evaluated the voice-commanded camera control concept. For this particular task, total voice control of continuous and discrete camera functions was significantly slower than manual control. There was no significant difference between voice and manual input for several types of errors. There was not a clear trend in subjective preference of camera command input modality. Task performance, in terms of both accuracy and speed, was very similar across both levels of experience.
Henry's voices: the representation of auditory verbal hallucinations in an autobiographical narrative.

PubMed

Demjén, Zsófia; Semino, Elena

2015-06-01

The book Henry's Demons (2011) recounts the events surrounding Henry Cockburn's diagnosis of schizophrenia from the alternating perspectives of Henry himself and his father Patrick. In this paper, we present a detailed linguistic analysis of Henry's first-person accounts of experiences that could be described as auditory verbal hallucinations. We first provide a typology of Henry's voices, taking into account who or what is presented as speaking, what kinds of utterances they produce and any salient stylistic features of these utterances. We then discuss the linguistically distinctive ways in which Henry represents these voices in his narrative. We focus on the use of Direct Speech as opposed to other forms of speech presentation, the use of the sensory verbs hear and feel and the use of 'non-factive' expressions such as I thought and as if. We show how different linguistic representations may suggest phenomenological differences between the experience of hallucinatory voices and the perception of voices that other people can also hear. We, therefore, propose that linguistic analysis is ideally placed to provide in-depth accounts of the phenomenology of voice hearing and point out the implications of this approach for clinical practice and mental healthcare. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Affective state and voice: cross-cultural assessment of speaking behavior and voice sound characteristics--a normative multicenter study of 577 + 36 healthy subjects.

PubMed

Braun, Silke; Botella, Cristina; Bridler, René; Chmetz, Florian; Delfino, Juan Pablo; Herzig, Daniela; Kluckner, Viktoria J; Mohr, Christine; Moragrega, Ines; Schrag, Yann; Seifritz, Erich; Soler, Carla; Stassen, Hans H

2014-01-01

Human speech is greatly influenced by the speakers' affective state, such as sadness, happiness, grief, guilt, fear, anger, aggression, faintheartedness, shame, sexual arousal, love, amongst others. Attentive listeners discover a lot about the affective state of their dialog partners with no great effort, and without having to talk about it explicitly during a conversation or on the phone. On the other hand, speech dysfunctions, such as slow, delayed or monotonous speech, are prominent features of affective disorders. This project was comprised of four studies with healthy volunteers from Bristol (English: n = 117), Lausanne (French: n = 128), Zurich (German: n = 208), and Valencia (Spanish: n = 124). All samples were stratified according to gender, age, and education. The specific study design with different types of spoken text along with repeated assessments at 14-day intervals allowed us to estimate the 'natural' variation of speech parameters over time, and to analyze the sensitivity of speech parameters with respect to form and content of spoken text. Additionally, our project included a longitudinal self-assessment study with university students from Zurich (n = 18) and unemployed adults from Valencia (n = 18) in order to test the feasibility of the speech analysis method in home environments. The normative data showed that speaking behavior and voice sound characteristics can be quantified in a reproducible and language-independent way. The high resolution of the method was verified by a computerized assignment of speech parameter patterns to languages at a success rate of 90%, while the correct assignment to texts was 70%. In the longitudinal self-assessment study we calculated individual 'baselines' for each test person along with deviations thereof. The significance of such deviations was assessed through the normative reference data. Our data provided gender-, age-, and language-specific thresholds that allow one to reliably distinguish between 'natural fluctuations' and 'significant changes'. The longitudinal self-assessment study with repeated assessments at 1-day intervals over 14 days demonstrated the feasibility and efficiency of the speech analysis method in home environments, thus clearing the way to a broader range of applications in psychiatry. © 2014 S. Karger AG, Basel.
Babbling, vegetative function, and language development after cricotracheal resection in aphonic children.

PubMed

Bohm, Lauren A; Nelson, Marc E; Driver, Lynn E; Green, Glenn E

2010-12-01

To determine the importance of prelinguistic babbling by studying patterns of speech and language development after cricotracheal resection in aphonic children. Retrospective review of seven previously aphonic children who underwent cricotracheal resection by our pediatric thoracic airway team. The analyzed variables include age, sex, comorbidity, grade of stenosis, length of resected trachea, and communication methods. Data regarding the children's pre- and postsurgical communication methods, along with their utilization of speech therapy services, were obtained via speech-language pathology evaluations, clinical observations, and a standardized telephone survey supplemented by parental documentation. Postsurgical voice quality was assessed using the Pediatric Voice Outcomes Survey. All seven subjects underwent tracheostomy prior to 2 months of age when corrected for prematurity. The subjects remained aphonic for the entire duration of cannulation. Following cricotracheal resection, they experienced an initial delay in speech acquisition. Vegetative functions were the first laryngeal sounds to emerge. Initially, the children were only able to produce these sounds reflexively, but they subsequently gained voluntary control over these laryngeal functions. All subjects underwent an identifiable stage of canonical babbling that often occurred concomitantly with vocalizations. This was followed by the emergence of true speech. The initial delay in speech acquisition observed following decannulation, along with the presence of a postsurgical canonical stage in all study subjects, supports the hypothesis that babbling is necessary for speech and language development. Furthermore, the presence of babbling is universally evident regardless of the age at which speech develops. Finally, there is no demonstrable correlation between preoperative sign language and rate of speech development. Copyright © 2010 The American Laryngological, Rhinological, and Otological Society, Inc.
[Effects of acaoustic adaptation of classrooms on the quality of verbal communication].

PubMed

Mikulski, Witold

2013-01-01

Voice organ disorders among teachers are caused by excessive voice strain. One of the measures to reduce this strain is to decrease background noise when teaching. Increasing the acoustic absorption of the room is a technical measure for achieving this aim. The absorption level also improves speech intelligibility rated by the following parameters: room reverberation time and speech transmission index (STI). This article presents the effects of acoustic adaptation of classrooms on the quality of verbal communication, aimed at getting the speech intelligibility at the good or excellent level. The article lists the criteria for evaluating classrooms in terms of the quality of verbal communication. The parameters were defined, using the measurement methods according to PN-EN ISO 3382-2:2010 and PN-EN 60268-16:2011. Acoustic adaptations were completed in two classrooms. After completing acoustic adaptations the reverberation time for the frequency of 1 kHz was reduced: in room no. 1 from 1.45 s to 0.44 s and in room no. 2 from 1.03 s to 0.37 s (maximum 0.65 s). At the same time, the speech transmission index increased: in room no. 1 from 0.55 (satisfactory speech intelligibility) to 0.75 (speech intelligibility close to excellent); in room no. 2 from 0.63 (good speech intelligibility) to 0.80 (excellent speech intelligibility). Therefore, it can be stated that prior to completing acoustic adaptations room no. 1 did not comply and room no. 2 barely complied with the criterion (speech transmission index of 0.62). After completing acoustic adaptations both rooms meet the requirements.
Auditory verbal hallucinations: Social, but how?

PubMed Central

Alderson-Day, Ben; Fernyhough, Charles

2017-01-01

Summary Auditory verbal hallucinations (AVH) are experiences of hearing voices in the absence of an external speaker. Standard explanatory models propose that AVH arise from misattributed verbal cognitions (i.e. inner speech), but provide little account of how heard voices often have a distinct persona and agency. Here we review the argument that AVH have important social and agent-like properties and consider how different neurocognitive approaches to AVH can account for these elements, focusing on inner speech, memory, and predictive processing. We then evaluate the possible role of separate social-cognitive processes in the development of AVH, before outlining three ways in which speech and language processes already involve socially important information, such as cues to interact with others. We propose that when these are taken into account, the social characteristics of AVH can be explained without an appeal to separate social-cognitive systems. PMID:29238264
Acoustic analysis of speech under stress.

PubMed

Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish

2015-01-01

When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation.
Voice synthesis application

NASA Astrophysics Data System (ADS)

Lightstone, P. C.; Davidson, W. M.

1982-04-01

The military detection assessment laboratory houses an experimental field system which assesses different alarm indicators such as fence disturbance sensors, MILES cables, and microwave Racons. A speech synthesis board which could be interfaced, by means of a computer, to an alarm logger making verbal acknowledgement of alarms possible was purchased. Different products and different types of voice synthesis were analyzed before a linear predictive code device produced by Telesensory Speech Systems of Palo Alto, California was chosen. This device is called the Speech 1000 Board and has a dedicated 8085 processor. A multiplexer card was designed and the Sp 1000 interfaced through the card into a TMS 990/100M Texas Instrument microcomputer. It was also necessary to design the software with the capability of recognizing and flagging an alarm on any 1 of 32 possible lines. The experimental field system was then packaged with a dc power supply, LED indicators, speakers, and switches, and deployed in the field performing reliably.
Lower Vocal Tract Morphologic Adjustments Are Relevant for Voice Timbre in Singing.

PubMed

Mainka, Alexander; Poznyakovskiy, Anton; Platzek, Ivan; Fleischer, Mario; Sundberg, Johan; Mürbe, Dirk

2015-01-01

The vocal tract shape is crucial to voice production. Its lower part seems particularly relevant for voice timbre. This study analyzes the detailed morphology of parts of the epilaryngeal tube and the hypopharynx for the sustained German vowels /a/, /e/, /i/, /o/, and /u/ by thirteen male singer subjects who were at the beginning of their academic singing studies. Analysis was based on two different phonatory conditions: a natural, speech-like phonation and a singing phonation, like in classical singing. 3D models of the vocal tract were derived from magnetic resonance imaging and compared with long-term average spectrum analysis of audio recordings from the same subjects. Comparison of singing to the speech-like phonation, which served as reference, showed significant adjustments of the lower vocal tract: an average lowering of the larynx by 8 mm and an increase of the hypopharyngeal cross-sectional area (+ 21:9%) and volume (+ 16:8%). Changes in the analyzed epilaryngeal portion of the vocal tract were not significant. Consequently, lower larynx-to-hypopharynx area and volume ratios were found in singing compared to the speech-like phonation. All evaluated measures of the lower vocal tract varied significantly with vowel quality. Acoustically, an increase of high frequency energy in singing correlated with a wider hypopharyngeal area. The findings offer an explanation how classical male singers might succeed in producing a voice timbre with increased high frequency energy, creating a singer`s formant cluster.
Lower Vocal Tract Morphologic Adjustments Are Relevant for Voice Timbre in Singing

PubMed Central

Mainka, Alexander; Poznyakovskiy, Anton; Platzek, Ivan; Fleischer, Mario; Sundberg, Johan; Mürbe, Dirk

2015-01-01

The vocal tract shape is crucial to voice production. Its lower part seems particularly relevant for voice timbre. This study analyzes the detailed morphology of parts of the epilaryngeal tube and the hypopharynx for the sustained German vowels /a/, /e/, /i/, /o/, and /u/ by thirteen male singer subjects who were at the beginning of their academic singing studies. Analysis was based on two different phonatory conditions: a natural, speech-like phonation and a singing phonation, like in classical singing. 3D models of the vocal tract were derived from magnetic resonance imaging and compared with long-term average spectrum analysis of audio recordings from the same subjects. Comparison of singing to the speech-like phonation, which served as reference, showed significant adjustments of the lower vocal tract: an average lowering of the larynx by 8 mm and an increase of the hypopharyngeal cross-sectional area (+ 21.9%) and volume (+ 16.8%). Changes in the analyzed epilaryngeal portion of the vocal tract were not significant. Consequently, lower larynx-to-hypopharynx area and volume ratios were found in singing compared to the speech-like phonation. All evaluated measures of the lower vocal tract varied significantly with vowel quality. Acoustically, an increase of high frequency energy in singing correlated with a wider hypopharyngeal area. The findings offer an explanation how classical male singers might succeed in producing a voice timbre with increased high frequency energy, creating a singer‘s formant cluster. PMID:26186691
A preliminary comparison of speech recognition functionality in dental practice management systems.

PubMed

Irwin, Jeannie Y; Schleyer, Titus

2008-11-06

In this study, we examined speech recognition functionality in four leading dental practice management systems. Twenty dental students used voice to chart a simulated patient with 18 findings in each system. Results show it can take over a minute to chart one finding and that users frequently have to repeat commands. Limited functionality, poor usability and a high error rate appear to retard adoption of speech recognition in dentistry.
Status Report on Speech Research: A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications, January 1-March 31, 1977.

ERIC Educational Resources Information Center

Haskins Labs., New Haven, CT.

This report is one of a regular series about the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. The 11 papers discuss the dissociation of spectral and temporal cues to the voicing distinction in initial stopped consonants; perceptual integration and selective attention in…
Perceptual evaluation and acoustic analysis of pneumatic artificial larynx.

PubMed

Xu, Jie Jie; Chen, Xi; Lu, Mei Ping; Qiao, Ming Zhe

2009-12-01

To investigate the perceptual and acoustic characteristics of the pneumatic artificial larynx (PAL) and evaluate its speech ability and clinical value. Prospective study. The study was conducted in the Voice Lab, Department of Otorhinolaryngology, The First Affiliated Hospital of Nanjing Medical University. Forty-six laryngectomy patients using the PAL were rated for intelligibility and fluency of speech. The voice signals of sustained vowel /a/ for 40 healthy controls and 42 successful patients using the PAL were measured by a computer system. The acoustic parameters and sound spectrographs were analyzed and compared between the two groups. Forty-two of 46 patients using the PAL (91.3%) acquired successful speech capability. The intelligibility scores of 42 successful PAL speakers ranged from 71 to 95 percent, and the intelligibility range of four unsuccessful speakers was 30 to 50 percent. The fluency was judged as good or excellent in 42 successful patients, and poor or fair in four unsuccessful patients. There was no significant difference in average fundamental frequency, maximum intensity, jitter, shimmer, and normalized noise energy (NNE) between 42 successful PAL speakers and 40 healthy controls, while the maximum phonation time (MPT) of PAL speakers was slightly lower than that of the controls. The sound spectrographs of the patients using the PAL approximated those of the healthy controls. The PAL has the advantage of a high percentage of successful vocal rehabilitation. PAL speech is fluent and intelligible. The acoustic characteristics of the PAL are similar to those of a normal voice.
Intensive Treatment of Dysarthria Secondary to Stroke

ERIC Educational Resources Information Center

Mahler, Leslie A.; Ramig, Lorraine O.

2012-01-01

This study investigated the impact of a well-defined behavioral dysarthria treatment on acoustic and perceptual measures of speech in four adults with dysarthria secondary to stroke. A single-subject A-B-A experimental design was used to measure the effects of the Lee Silverman Voice Treatment (LSVT[R]LOUD) on the speech of individual…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.